Lexical Analyzer

Modules  and Algorithms:

Functions:

Verify(): The lexical analyzer uses the verify operation to determine whether  there is entry for a lexeme in the symbol table.

Data  structures:

Structure : A structure in C is a collection of variables which contains related data items of similar and /or dissimilar data types but logically related items. Each variable in the structure represents an item and is called as a member or field of the structure. Complex data can be important represented in more meaningful way using structures and is one of the very features available in C.

An array of structures is basically a array of records. Each an every record has the same format consisting of similar or dissimilar data types but logically related entities.

Algorithms:

Procedure main
begin
if symb=’#’  then
begin
advance to next token in input file
if symb=’i’ then
begin
advance to next token in input file
while symb!=’\n’ do
begin
advance to next token in input file
end {while }
print symb is a preprocessor directive
end {if symb=’i’}
if symb=’d’ then
begin
advance to next token input file
while symb!=’ ‘ do
begin
advance to next token in input file
end{while}
advance to next token in input file
print symb is a constant
advance to next token in input file
while symb!=’\n’ do
begin
advance to the next token in input file
end {while}
end {if symb=’d’}
end {if symb=’#’}
if symb is a alphabet  or symb=’_’ then
begin
advance to the next token in input file
while symb is a digit or alphabet or symb=’_’ do
begin
advance to the next token of input  file
end {while}
call function verify  to check whether symb is a identifier or keyword
end {if}
if symb=’+’ then
begin
advance to the next token in input file
if symb=’+’
print symb is ++ operator
else
ungetc symb from the input file
print symb is + operator
end {if}
if symb=’-’ then
begin
advance to the next token in input file
if symb=’-’
print symb is — operator
else
ungetc symb from the input file
print symb is – operator
end {if}
if symb=’|’ then
begin
advance to the next token in input file
if symb=’|’
print symb is logical or  operator
else
ungetc symb from the input file
print symb is bitwise or operator
end {if}
if symb=’*’ then
begin
print symb is a multiplication operator
end {if}
if symb=’?’ then
begin
print symb  is a conditional operator
end{if}
if symb=’!’or symb=’>’or symb=’<’then
begin
advance to the next token in input file
if symb=’=’
print symb is a relational operator
else
ungetc symb from output file
print symb is a operator
end{if}
if symb=’=’
begin
advance to next token in input file
if symb=’=’then
print symb is  equal to operator
else
ungetc symb from output file
print symb is assignment operator
end{if}
if symb=’&’ then
begin
advance to next token in input file
if symb=’&’ then
print symb is a logical and operator
else
print & symb is an address operator
end{if}
if symb=’/’ then
begin
advance to next token in input file
if symb=’*’ then
begin
advance to next token in input file
while symb!=’/’ do
advance to next token in input file
end{while}
end{if}
else if symb=’/’ then
begin
advance to next token in input file
while symb!=’\n’ do
advance to next token in input file
end{while}
end{if}
else
ungetc symb from output file
print symb is a division operator
end{if}
if symb is a digit then
begin
advance to next token in input file
while symb is a digit or symb=’.’ then
begin
advance to next token in input file
end {while}
print symb is a number
end{if}
if symb =’”’ then
begin
advance to next token in input file
while symb!=’”’ do
begin
advance to next token in input file
end{while}
print symb is a string
end{if}}
if symb= ‘{‘ then
print open brace
if symb=’}’ then
print close brace
if symb=’[‘ then
print  open bracket
if symb=’]’ then
print close bracket
if symb=’(‘ then
print open parenthesis
if symb=’)’ then
print close parenthesis
end {procedure main}
procedure verify
begin
scan the symbol table to check if encountered token exists
if exists
return token value
end{procedure}

Editorial Team
Editorial Team

We are a group of young techies trying to provide the best study material for all Electronic and Computer science students. We are publishing Microcontroller projects, Basic Electronics, Digital Electronics, Computer projects and also c/c++, java programs.

27 thoughts on “Lexical Analyzer

  1. Hey can plz mail me the lex code to identify different tokens from a c code..
    thanx in advance..

  2. \\ Token Separation \\

    Write a program to identify and generate the tokens present in the given input
    #include
    #include
    #include
    #include

    int key = 0;
    char expr[100];
    char cont[][20]={“CONTROLS”,”for”,”do”,”while”,”NULL”,};
    char cond[][20]={“CONDITION”,”if”,”then”,”NULL”};
    char oprt[][20]={“OPERATOR”,”+”,”-“,”*”,”/”,”%”,”<","”,”>=”,”=”,”(“,”)”,”NULL”};
    char branch[][20]={“BRANCHING”,”goto”,”jump” ,”NULL”};
    void checking(char[],char[][20]);
    void main()
    {
    int i,j,l,k,m,n;
    char sbexpr[50],txt[3];
    clrscr();
    cout<=97 && sbexpr[m]=65 && sbexpr[m]<=90)))
    {
    cout<<"\n"<<sbexpr[m]<“<<"Identifier\n";
    key = 1;
    }
    }
    }
    if(key == 0)
    {
    cout<<"\n"<<sbexpr<“<<"Address\n";
    key = 1;
    }
    }

    getch();
    }

    void checking (char expr[],char check[][20])
    {
    for(int i=1;strcmp(check[i],"NULL")!=0;i++)
    {
    if(strcmp(expr,check[i])==0)
    {
    cout<<expr<“<<check[0]<<"\n";
    key = 1;
    }
    }
    }

  3. #include
    #include
    #include
    #include

    void Open_File();
    void Demage_Lexeme();
    int Search(char[256],int);
    void analyze();
    void Skip_Comment();
    void Read_String();
    void Is_Keyword_Or_Not();
    void Is_Identifier_Or_Not();
    void Is_Operator_Or_Not();
    void Read_Number();
    void Is_Special_Or_Not();
    void Is_Comparison_Or_Not();
    void Add_To_Lexical (char[256],int,char[256]);
    void Print_ST();
    void Print_TOKEN();
    void Token_Attribute();
    truct lexical
    {
    char data[256]; //Value of token.
    int line[256]; //Line # which token appear in input
    file.
    int times; //# of times that token appear in input
    file.
    char type[256]; //Type of each token.
    struct lexical *next;
    };

    typedef struct lexical Lex;
    typedef Lex *lex;

    /****************************************************************
    File pointer for accessing the file.
    *****************************************************************/

    FILE *fp;
    FILE *st;
    FILE *token;
    char lexeme[256],ch;
    int f,flag,line=1,i=1;
    lex head=NULL,tail=NULL;

    /****************************************************************
    Array holding all keywords for checking.
    *****************************************************************/

    char
    *keywords[]={“procedure”,”is”,”begin”,”end”,”var”,”cin”,”cout”,”if”,
    “then”,”else”,”and”,”or”,”not”,”loop”,”exit”,”when”,
    “while”,”until”};

    /****************************************************************
    Array holding all arithmetic operations for checking.
    *****************************************************************/

    char arithmetic_operator[]={‘+’,’-‘,’*’,’/’};

    /****************************************************************
    Array holding all comparison operations for checking.
    *****************************************************************/

    char *comparison_operator[]={“”,”=”,”<=","”,”>=”};

    /****************************************************************
    Array holding all special for checking.
    *****************************************************************/

    char special[]={‘%’,’!’,’@’,’~’,’$’};

    /****************************************************************

    **************
    *MAIN PROGRAM*
    **************

    *****************************************************************/

    void main()
    {
    Open_File();
    analyze();
    fclose(fp);
    Print_ST();
    Print_TOKEN();
    }

    /****************************************************************
    This function open input sourse file.
    *****************************************************************/

    void Open_File()
    {

    fp=fopen(“source.txt”,”r”); //provide path for source.txt here
    if(fp==NULL)
    {
    printf(“!!!Can’t open input file – source.txt!!!”);
    getch();
    exit(0);
    }
    }

    /****************************************************************
    Function to add item to structure of array to store data and
    information of lexical items.
    *****************************************************************/

    void Add_To_Lexical (char value[256],int line,char type[256])
    {
    lex new_lex;

    if (!Search(value,line)) //When return 1 the token not found.
    {

    new_lex=malloc(sizeof(Lex));

    if (new_lex!=NULL)
    {
    strcpy(new_lex->data,value);
    new_lex->line[0]=line;
    new_lex->times=1;
    strcpy(new_lex->type,type);
    new_lex->next=NULL;

    if (head==NULL)
    head=new_lex;
    else
    tail->next=new_lex;

    tail=new_lex;
    }
    }
    }

    /****************************************************************
    Function to search token.
    *****************************************************************/

    int Search (char value[256],int line)
    {
    lex x=head;
    int flag=0;

    while (x->next!=NULL && !flag)
    {
    if (strcmp(x->data,value)==0)
    {
    x->line[x->times]=line;
    x->times++;
    flag=1;
    }
    x=x->next;
    }
    return flag;
    }

    /****************************************************************
    Function to print the ST.TXT .
    *****************************************************************/

    void Print_ST()
    {
    lex x=head;
    int j;

    if ((st=fopen(“ST.TXT”,”w”))==NULL)
    printf(“The file ST.TXT cat not open.
    “);

    else

    {
    fprintf(st,” %s %s %s
    “,”Line#”,”Lexeme”,”Type”);
    fprintf(st,” —- —— —-
    “);

    while (x!=NULL)
    {
    if ((strcmp(x->type,”num”)==0) ||
    (strcmp(x->type,”keyword”)==0) ||
    (strcmp(x->type,”identifier”)==0))
    {
    fprintf(st,” “);

    for (j=0;jtimes;j++)
    {
    fprintf(st,”%d”,x->line[j]);
    if (j!=x->times-1) //This condition to prevent the comma
    fprintf(st,”,”,x->line[j]); //”,” to not print after last line #.
    }

    fprintf(st,” %-6s %-6s
    “,x->data,x->type);
    }
    x=x->next;
    }

    fclose(st);
    }
    }

    /****************************************************************
    Function to print the TOKENS.TXT .
    *****************************************************************/

    void Print_TOKEN()
    {
    int flag=0;

    fp=fopen(“source.txt”,”r”);

    if(fp==NULL)
    {
    printf(“!!!Can’t open input file – source.txt!!!”);
    getch();
    exit(0);
    }

    else

    {
    if ((token=fopen(“TOKENS.TXT”,”w”))==NULL)
    printf(“The file ST.TXT cat not open.
    “);

    else

    {
    ch=fgetc(fp);

    while (!(feof(fp)))
    {

    if (ch==’ ‘ && !flag)
    {
    do
    ch=fgetc(fp);
    while (ch==’ ‘);

    fseek(fp,-2,1);
    ch=fgetc(fp);
    flag=1;
    }

    if (ch!=’
    ‘ && ch!=’ ‘)
    fprintf(token,”%c”,ch);

    if (ch==’
    ‘)
    {
    fprintf(token,”
    “);
    Token_Attribute();
    i++;
    flag=0;
    }

    ch=fgetc(fp);
    }
    }
    }
    fclose(fp);
    fclose(token);
    }

    /****************************************************************
    Function to put the token and atrribute in TOKENS.TXT .
    *****************************************************************/

    void Token_Attribute()
    {
    lex x=head;
    int j;

    while (x!=NULL)
    {
    if (x->line[0]==i)
    {
    fprintf(token,”token : %-4s “,x->type);

    if ((strcmp(x->type,”num”)==0) ||
    (strcmp(x->type,”keyword”)==0) ||
    (strcmp(x->type,”identifier”)==0))

    {
    fprintf(token,”attribute : line#=%-4d
    “,i);
    }

    else

    {
    fprintf(token,”attribute : %-4s
    “,x->data);
    }

    }
    x=x->next;
    }
    fprintf(token,”
    “);
    }

    /****************************************************************
    Function to create lexical analysis.
    *****************************************************************/

    void analyze()
    {

    ch=fgetc(fp); //Read character.

    while(!feof(fp)) //While the file is not end.
    {

    if(ch==’
    ‘) //Compute # of lines in source.txt
    .
    {
    line++;
    ch=fgetc(fp);
    }

    if(isspace(ch) && ch==’
    ‘ )
    {
    line++;
    ch=fgetc(fp);
    }
    if(isspace(ch) && ch!=’
    ‘ ) //The character is space.
    ch=fgetc(fp);

    if(ch==’/’ || ch=='”‘) //Function for skipping comments in the
    file
    Skip_Comment(); //and ‘”‘ with display statements.

    if(isalpha(ch)) //The character is leter.
    {
    Read_String();
    Is_Keyword_Or_Not();
    Is_Operator_Or_Not();
    Is_Identifier_Or_Not();
    }

    if(isdigit(ch)) //The character is digit.
    Read_Number();

    if (ch==’;’) //The character is semicolon.
    Add_To_Lexical(“;”,line,”semicolon”);

    if (ch==’:’) //The character is colon.
    Add_To_Lexical(“:”,line,”colon”);

    if (ch==’,’) //The character is comma.
    Add_To_Lexical(“,”,line,”comma”);

    if (ch=='(‘) //The character is parenthesis.
    Add_To_Lexical(“(“,line,”parenthesis”);

    if (ch==’)’) //The character is parenthesis.
    Add_To_Lexical(“)”,line,”parenthesis”);

    //The character is comparison_operator
    if (ch==”)
    Is_Comparison_Or_Not();

    Is_Special_Or_Not(); //After failed scaning in before cases
    //check the character is special or not.
    Demage_Lexeme();

    if(isspace(ch) && ch==’
    ‘ )
    {
    line++;
    ch=fgetc(fp);
    }
    else
    ch=fgetc(fp);
    }
    }

    /****************************************************************
    This function read all character of strings.
    *****************************************************************/

    void Read_String()
    {
    int j=0;

    do
    {
    lexeme[j++]=ch;
    ch=fgetc(fp);
    } while(isalpha(ch));

    fseek(fp,-1,1);
    lexeme[j]=’

  4. Can you tell how many tokes the following statments consists of:-

    printf(“Test of %d tokens %d”,a,b);

    printf(“TEST”);

    ??

    i am bit confused weather %d will be treated as separate tokens or not?

    And for the literal weather the quotations (“) will be counted as tokens or not?

  5. plz send me the source code in lex to identify different parts of speech by making use of symbol table.

  6. 2.Write a program in LEX to count the no of:
    (i) positive and negative integers
    (ii) positive and negative fractions.
    For C and C++ source programs

  7. would u give an answer for this

    1. Write a program in LEX to count the no of:
    (i) positive and negative integers
    (ii) positive and negative fractions.
    For C and C++ source programs
    1. Write a LEX program to recognize a valid C and C++ programs if u can include if condition, loop.

  8. Kindly send me a program that will accept expressions from the user and define the keywords, identifiers, number, relational operator and etc.

    Sample output:
    Enter expression:
    scanf(“%s”, exp);

    scanf is a keyword
    ( is a punctuation
    ” is a punctuation
    %s
    ” is a punctuation
    , is a punctuation
    exp is an identifier
    ) is a punctuation
    ; is a punctuation

    mail me on joeann_espanol@yahoo.com

    thank’z

  9. kylangan ko din po ng program n lexical analyzer.. the same program sa taas..

    sample output:
    m=1* (a+v % 3);
    m is a variable
    = is an assignment operator
    1 is a number
    ( is a punctuation
    a is a variable
    + is an arithmetic operator
    v is a variable
    % is an arithmetic operator
    3 is a number
    ) is a punctuation
    ; is a separator

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest updates on your inbox

Be the first to receive the latest updates from Codesdoc by signing up to our email subscription.

    StudentProjects.in