top

Search

C Tutorial

.

UpGrad

C Tutorial

Compilation process in C

Introduction

Similar to many other programming languages, C is comprehensible to a microprocessor. To bridge this gap, it is necessary to convert the instructions set by humans to machine code that a microprocessor can interpret. This is where the role of a compiler comes into play. 

The article will discuss the compilation process in C# in detail, including its importance. Gain insights into various compiler options that are currently available while delving into the topic.

So let’s get started!

Definition of Compilation Process

The compilation process in C can be defined as a multi-step process typically responsible for converting a source code into an object code. Furthermore, it scrutinises the source code thoroughly for any syntactical or structural errors so that it can generate the object code error-free. 

To provide a better understanding of the same, let’s take a look at this small example. Suppose there are two individuals, A and B. A is well-conversed in English, but B only knows Spanish. Therefore, to facilitate a proper flow of conversation between two individuals, we require a translator. In terms of programming, this is what we refer to as the compilation process.

Importance of understanding the Compilation Process in C programming language

Gaining a thorough understanding of the compilation process in C is essential for several reasons, some of which are provided below. 

  • Debugging and Troubleshooting - An in-depth understanding of the compilation process can help you identify and fix errors in your code. You can also interpret error messages more efficiently and proactively prevent unexpected behaviours during the compilation phase.

  • Efficiency and Optimization - Knowledge of the compilation process in C enables you to write more efficient code and optimise its performance. Furthermore, it also facilitates the process of making informed decisions related to data types, or memory management, that will ultimately impact the program’s efficiency. 

  • Security Considerations - Lastly, understanding the compilation process in C helps you address and rectify potential security vulnerabilities in your code. A few examples of such might include buffer overflows, and injection attacks, among others.

Steps in the Compilation Process

The compilation process in C typically consists of 4 main steps. They are, namely, 

  • Preprocessing

  • Compilation

  • Assembly

  • Linking 

Preprocessing

Preprocessing is an initial phase of the compilation process in C. It involves the use of the preprocessor tool, which handles directives starting with the # symbol. These directives are processed by the preprocessor to perform various tasks, which are mentioned as follows.

  • Comments Removal

Comments typically do not play a significant role in the compiled code. They are usually responsible for facilitating the process of code readability and avoiding magic numbers. Therefore, in the preprocessing phase of the compilation process in C, all comments indicated by ‘//’ or ‘/* */ are effectively removed. 

/* This is a 
 multi-line comment in C */

#include<stdio.h>

int main()
{
    // this is a single-line comment in C
    
    return 0;
}
  • Macros Expansion

Macros are defined using the ‘#define” directive. In the preprocessing stage, the macro references embedded into the source code are replaced with their corresponding definitions. Thus, it becomes much easier to define and use reusable codes or snippets of code. 

Defining a value

#define G 9.8

Defining an expression

#define SUM(a,b) (a + b)
  • File inclusion

File inclusion is another important task carried out during the preprocessing stage, with the help of the “#include’ directive. It involves copying the contents of the included file into the source code at the point of inclusion. This enables users to access functions, macros, declarations, and any other code or data defined in the included file, facilitating code modularity and reuse.

In C programming, when we need to use basic functions like printf() and scanf(), we must include the pre-defined standard input/output header file called stdio.h.

#include <stdio.h>
  • Conditional Compilation

Lastly, conditional compilation is carried out using the ‘#if’, ’#ifdef’, ’#ifndef’, ‘#endif’, and ‘#else’ commands. It is typically responsible for including or excluding certain code sections based on specific conditions. 

#include <stdio.h>

// if we uncomment the below line, then the program will print AGE in the output.
// #define YEAR 1990

int main()
{
// if `YEAR` is defined then print the `YEAR` else print "Not Defined"
#ifdef AGE
printf("Year is %d", YEAR;
#else
printf("Not Defined");
#endif

return 0;
}

Output

Not Defined

The #ifdef directive checks whether the macro YEAR is defined or not. In this case, since the #define statement for YEAR is commented out, the code within the #ifdef YEAR block will not be executed. Instead, the control flow will move to the #else block, where "Not Defined" will be printed on the output screen. The #endif directive ensures that the conditional compilation block is properly terminated.

Compilation

Following the preprocessing stage, the expanded code is then passed on to the compiler. Simply put, the main responsibility of a compiler is to convert the high-level language to assembly code. After this has been achieved successfully, the assembly code is then converted into machine code so that the machine can properly interpret the same. 

Similar to the preprocessing phase, the compilation stage also consists of various tasks. Such include,

  • Lexical Analysis

Also sometimes referred to as scanning, in this particular stage, the preprocessed source code is tokenised. Tokens basically refer to the meaningful units that you generate after breaking down a code. This includes keywords, operators, punctuation symbols, and constants, among others. 

  • Syntax Analysis

Once you have successfully generated the tokens, they are then analysed to check whether they are in sync with the grammar rules of the C language. This phase confirms the accurate structure and arrangement of the tokens, verifying that they form valid expressions, statements, and program structures.

  • Semantic Analysis 

Semantic analysis, as the name suggests itself, is responsible for identifying any error that might arise in the code’s semantics. This includes analysis of all functions, variables, types and expressions, ensuring coherence with the program.  

  • Intermediate Code Generation

Following the semantic analysis stage, an intermediate code is generated in some cases. One main reason for this is to facilitate further optimisation before actually generating machine code. 

  • Optimisation

Various optimisation techniques, such as constant folding and loop optimisation, are carried out to increase the efficiency of the compiled code. It can either be on the intermediate code or directly on the analysed code. 

  • Code Generation

Lastly, the analysed code, or the intermediate representation, is then translated into low-level machine code that is specific to the platform. It typically consists of various assembly instructions that can be performed by the computer’s hardware. 

 Assembling

The assembly stage is an intermediate phase in the compilation process of C, following the compilation stage and preceding the linking stage. During this stage, the assembly file that has already been generated in the compilation stage is converted into machine code that is specific to the target architecture. This is mainly carried out by a specific program called the assembler. It reads the assembly file, interprets the instructions, and then generates object codes. 

There are two main components of the assembling stage. They are, namely, 

Assembly Language- It refers to a low-level programming language that is similar to the architecture of the target processor. 

Assembly File - It comprises mainly the assembly code, a textual representation of all the machine instructions. 

Linking

Finally, we move on to the last step of the entire compilation process in c, linking. It is the most critical stage since it generates the final output that other programs will further utilise or process. During linking, all the object files and libraries are combined to create the final executable file (.exe. n Windows) or a shared library. It involves various tasks such as symbol resolution, address fix-up, library resolution, and output generation, among others.

Compilation Process In C: Example

Let’s explore a small example to help you understand the compilation process in C with diagram.

Suppose we have a single C source file, named ‘main.c’, that contains the following code,

#include <stdio.h>
int main() {
   printf("Hello, World!");
   return 0;
}

Preprocessing- During this stage, the source code is scanned thoroughly and 

then processed for macro expansion and removal of comments. It is then saved as ‘main.i’

Compilation- The compiler then translates the preprocessed code into the assembly language. It is also during this stage that optimisation, semantic analysis, lexical analysis, and syntax analysis are performed. The compiled code is then saved as ‘main.s’. 

Assembly - With the help of the assembler, the assembly code is converted into machine code. It generates an object file named ‘main.o’, which comprises the binary representation of the code. 

Linking- Finally, the object file,’main.o’, is combined with any other required libraries with the help of the linker. It resolves symbol references and generates the final executable file, named ‘main.exe’

When you run ‘main.exe’, you will receive the desired output, “Hello, World!”

Understanding Compiler Options

Compiler options in C can be described as command-line flags passed on to the compiler during the compilation process. Their ultimate goal is to provide any kind of additional instructions or configurations to the compiler that will ultimately impact the final executable code. They are also sometimes referred to as compiler flags. 

Furthermore, with the help of the same, you can also customise the compilation process and enable warning and control optimisations. Given these huge numbers of advantages, you might be wondering what these options are. Let’s find out. 

Optimization 

‘-O0’, ‘-O1’, ‘-O2’, ‘-O3’: Each signifies different optimization levels. For example, while ‘-O0’ disables optimization, ‘-O3’ provides a high level of optimization. 

Warning 

‘-Wall’: It generates additional warning during the compilation process. It acts as a safeguard against the occurrence of non-standard code constructs. 

‘-Werror’ - Its ultimate purpose is to treat all warnings as errors. Thus, in the event of the generation of any warning, the compilation process will fail. 

Debugging 

‘-g’: It enables users to analyse and debug each and every step through the code during its runtime. 

‘-DDEBUG’: It facilitates the conditional compilation of debug-specific code’. 

Output 

‘-o <output>’ : It specifies the output name for the compiled executable.

‘-c’: Its function is only limited to compiling the source code. This means it cannot perform linking. 

That being said, these are not the only compiler options available in C. Several other compiler flags are used, each with its own functionalities. These include preprocessor options, linking options, and language options. 

Conclusion

To sum up, the importance of understanding the compilation process in C is many. From debugging and troubleshooting to increased optimisation and efficiency, it helps users to get an in-depth understanding of how exactly a human-understandable code is converted into a machine-understandable code. 

To know more about the same, do not forget to check out upGrad’s MS in Computer Science program in association with Liverpool John Moores University. From just-in-time interviews to 1:1 high-performance coaching, it provides a plethora of benefits to its students.

FAQs

Q1: Can you state the five stages of compilation?

There are typically five stages of compilation. They are, namely, lexical analysis, syntax analysis, semantic analysis, code generation, and optimization. 

Q2: What is the function of a compiler in C?

A compiler can be defined as a special program entrusted with the responsibility of converting a source code into machine code. The source code is usually written in a human-understandable language such as C++ or Java. 

Q3: What are the different types of compilers?

Broadly speaking, there are four different types of compilers. They are, namely, cross compilers, native compilers, Just-In-Time compilers, source-to-source and optimizing compilers, among others. 

Leave a Reply

Your email address will not be published. Required fields are marked *