8. Compiling and running basics

8.1. Motivation and plan

This is a review and to make sure we are comfortable with the business of building and running software from source code.

As usual I focus on Python and C.

Note that although I discuss aspects of the languages (Python and C), the focus here is on how to build programs, and the interations with the file system. That’s what this hacker’s compendium is mostly about: not “pure” programming language information, but rather tricks to be really productive when programming on your operating system.

8.2. Python

You don’t need to worry about compiling python code, but you should structure your python programs to execute cleanly. You can do this by having the following structure to your program:

Listing 8.2.1 simple-program.py – not much here, just demonstrates a way to set up a main() function.
#! /usr/bin/env python3

def main():
    print('this is my main program')
    ## [all the rest of your main program]

## [other functions]

if __name__ == '__main__':
    main()

You should then make the program executable and run it with:

$ chmod +x simple-program.py
$ ./simple-program.py

In python you don’t have the kind of separate compilation and linking of binary files that you have with languages like C, C++ and FORTRAN, but you can and should have your code distributed between separate files when it starts growing. This offers a nice conceptual separation, and avoids too-long source files.

An example could be in the following three files:

Listing 8.2.2 prog-with-functions.py – a program which calls two functions which are stored in separate files.
#! /usr/bin/env python3

import file_with_f1
import file_with_f2

def main():
    print('this is my main program; I will call f1')
    print('result:', file_with_f1.f1())
    print('and now I will call f2')
    print('result:', file_with_f2.f2())

## [other functions]

if __name__ == '__main__':
    main()
Listing 8.2.3 file_with_f1.py – a module with the function f1() in it.
def f1():
    return 42
Listing 8.2.4 file_with_f2.c
import math

def f2():
    """f2() returns the sin of PI/6"""
    return math.sin(math.pi/6.0)  ## sin and pi are defined in math

Some things to note here:

  • What’s with that snippet:

    [...]
    if __name__ == '__main__':
        main()
    

    at the end of the file with the main program? You don’t need to remember this, but run through it once: if the file is executed directly then the global variable __name__ is set to the string '__main__', which means we want to call ``main()``. That’s why we put that litany at the end of python programs.

  • The filename file_with_f1.py has hyphens instead of underscores. This is because the hyphen character is the same as a minus sign, so python syntax would not let you have something like import file-with-f1. The main program can have hyphens, but if there’s any chance that you might some day turn it in to a module and call it from another program.

  • Once you import the module you call the functions with the syntax MODULE.FUNC(), for example file_with_f1.f1(). The import instruction has other features so you can abbreviate the name of the module, or even skip it altogether, or selectively import only some functions from the file.

  • The files file_with_f1 and file_with_f2 need to be in the same directory as the program that calls them. You can put them in a different directory if you add that directory to the python list sys.path.

8.3. C

8.3.1. Flow of compiling and linking

In C (and C++ and FORTRAN and other languages that are typicall compiled) your program will often consist of one or more .c files. When you first learn to program in C you might have a single file my-prog.c:

Listing 8.3.1.1 my-prog.c
#include <stdio.h>
int main()
{
  printf("hello world\n");
  return 0;
}

and you might compile it and run it like this:

$ gcc my-prog.c -o my-prog
$ ./my-prog
hello world
$

But as your program grows bigger you will find that “separate compilation” is a big deal in C: you will often have many files that you compile separately, after which you link them together into a single executable. The process for a single source file might look like this:

digraph {

   {
   rank=source "create C main\nprogram my-prog.c"
   }
   {
   rank=sink "run it with ./my-prog"
   }


   edge [lblstyle="above, sloped"];
   "create C main\nprogram my-prog.c" ->
   "compile it with\ngcc my-prog.c -o my-prog"
   [label="single C file"];
   "compile it with\ngcc my-prog.c -o my-prog" ->
   "run it with ./my-prog" ->
   "make changes to my-prog.c" ->
   "compile it with\ngcc my-prog.c -o my-prog";
}

And if you have multiple source files my-prog.c, f1.c and f2.c, they might look like this:

Listing 8.3.1.2 my-prog.c
#include <stdio.h>

extern int f1();
extern double f2();
int main()
{
  printf("function f1() returns %d\n", f1());
  printf("function f2() returns %d\n", f2());
  return 0;
}
Listing 8.3.1.3 f1.c
/* f1() returns the number 42 */
int f1()
{
  return 42;
}
Listing 8.3.1.4 f2.c
#include <math.h>

/* f2() returns the sin of PI/6 */
double f2()
{
  return sin(M_PI/6.0);   /* M_PI is defined in math.h */
}

And the process for compiling and linking those multiple source files might look like this:

digraph {

   {
   rank=same "compile *each* file with\ngcc -c file.c"
             "recompile *just*\nthat .c file with\ngcc -c file.c"
   }
   {
   rank=source "create C main\nprogram my-prog.c\nand files f1.c, f2.c"
   }
   {
   rank=sink "run it with ./my-prog"
   }

   edge [lblstyle="above, sloped"];
   "create C main\nprogram my-prog.c\nand files f1.c, f2.c" ->
   "compile *each* file with\ngcc -c file.c"
   [label="multiple C files"];
   "compile *each* file with\ngcc -c file.c" ->
   "link files together with\ngcc -o my-prog my-prog.o f1.o f2.o -lm" ->
   "run it with ./my-prog" ->
   "make changes to a\nsingle .c file" ->
   "recompile *just*\nthat .c file with\ngcc -c file.c" ->
   "link files together with\ngcc -o my-prog my-prog.o f1.o f2.o -lm";
}

8.3.2. What are libraries and header files?

Terminology is important to understand what’s happening when you compile and link C (and C++) code. Let’s start by talking about libraries. When you start programming in C you are told to put this line at the top of your program:

#include <stdio.h>

and you might have thought in a muddled manner (as I did at first) “aha! that line links to the standard I/O library which gives me functions like printf()!!”

This will get you through your initial learning process, but let’s try to make that narrative really precise.

A somewhat more precise narrative is:

Using a library in C involves two parts: one is telling your code what functions are available and what arguments they take, so that you can call them properly. The other is to link the library code with your code.

The first part (compile-time) is achieved by including a header file in your source code, with instructions like #include <math.h>. The second part is achieved when you link your source code to the libraries you use. In the command line

$ gcc myprog.c -lm

the -lm portion means “link to the library in /usr/lib/libm.a”, which is the C math library.

So what happens if you forget the #include <math.h> or the -lm? Let’s try it out. Write a simple program:

#include <stdio.h>
#include <math.h>

int main()
{
    double x = sin(M_PI/3);
    printf("%g\n", x);
    return 0;
}

Comment out the #include <math.h> and try compiling with gcc myprog.c -lm. You will get a warning to the effect that you have an “implicit declaration of sin” and an error that M_PI is undeclared. That’s because these things are defined in math.h.

You can uncomment the #include <math.h> and compile again, but this time leave out the -lm and compile with gcc myprog.c.

8.4. C: scope and resolving external variables

[not yet written] This is another one of those topics that shifts you from a “beginner learning the syntax of a language” to being a “person who can understand and design larger programs”.

Programs grow and become complex, and much of the business of software engineering is coming up with ways to tame the complexity of large programs.

The first of these steps dates back a very long time: compilation in separate files. When you compile