.. _sec-compiling-and-running-basics:

==============================
 Compiling and running basics
==============================

Motivation and plan
===================

This is a review and to make sure we are comfortable with the business
of building and running software from source code.

As usual I focus on Python and C.

Note that although I discuss aspects of the languages (Python and C),
the focus here is on how to *build* programs, and the interations with
the file system.  That's what this hacker's compendium is mostly
about: not "pure" programming language information, but rather tricks
to be really productive when programming on your operating system.


Python
======

You don't need to worry about compiling python code, but you should
structure your python programs to execute cleanly.  You can do this by
having the following structure to your program:

.. code-block:: python
   :caption: simple-program.py -- not much here, just demonstrates a
             way to set up a ``main()`` function.

   #! /usr/bin/env python3

   def main():
       print('this is my main program')
       ## [all the rest of your main program]

   ## [other functions]

   if __name__ == '__main__':
       main()

You should then make the program executable and run it with:

.. code-block:: console

   $ chmod +x simple-program.py
   $ ./simple-program.py

In python you don't have the kind of separate compilation and linking
of binary files that you have with languages like C, C++ and FORTRAN,
but you can and should have your code distributed between separate
files when it starts growing.  This offers a nice conceptual
separation, and avoids too-long source files.

An example could be in the following three files:

.. code-block:: python
   :caption: prog-with-functions.py -- a program which calls two
             functions which are stored in separate files.

   #! /usr/bin/env python3

   import file_with_f1
   import file_with_f2

   def main():
       print('this is my main program; I will call f1')
       print('result:', file_with_f1.f1())
       print('and now I will call f2')
       print('result:', file_with_f2.f2())

   ## [other functions]

   if __name__ == '__main__':
       main()


.. code-block:: python
   :caption: file_with_f1.py -- a module with the function ``f1()`` in
             it.

   def f1():
       return 42


.. code-block:: python
   :caption: file_with_f2.c

   import math

   def f2():
       """f2() returns the sin of PI/6"""
       return math.sin(math.pi/6.0)  ## sin and pi are defined in math

Some things to note here:

* What's with that snippet:

  .. code-block:: python

     [...]
     if __name__ == '__main__':
         main()

  at the end of the file with the main program?  You don't need to
  remember this, but run through it once: if the file is executed
  directly then the global variable ``__name__`` is set to the string
  ``'__main__'``, *which means we want to call ``main()``*.  That's
  why we put that litany at the end of python programs.

* The filename ``file_with_f1.py`` has hyphens instead of underscores.
  This is because the hyphen character is the same as a minus sign, so
  python syntax would *not* let you have something like ``import
  file-with-f1``.  The main program can have hyphens, but if there's
  any chance that you might some day turn it in to a module and call
  it from another program.

* Once you import the module you call the functions with the syntax
  MODULE.FUNC(), for example file_with_f1.f1().  The ``import``
  instruction has other features so you can abbreviate the name of the
  module, or even skip it altogether, or selectively import only some
  functions from the file.

* The files ``file_with_f1`` and ``file_with_f2`` need to be in the
  same directory as the program that calls them.  You can put them in
  a different directory if you add that directory to the python list
  ``sys.path``.


C
=

Flow of compiling and linking
-----------------------------

In C (and C++ and FORTRAN and other languages that are typicall
*compiled*) your program will often consist of one or more ``.c``
files.  When you first learn to program in C you might have a single
file ``my-prog.c``:

.. code-block:: c
   :caption: my-prog.c

   #include <stdio.h>
   int main()
   {
     printf("hello world\n");
     return 0;
   }

and you might *compile* it and run it like this:

.. code-block:: console

   $ gcc my-prog.c -o my-prog
   $ ./my-prog
   hello world
   $ 

But as your program grows bigger you will find that "separate
compilation" is a big deal in C: you will often have many files that
you compile separately, after which you *link* them together into a
single *executable*.  The process for a single source file might look
like this:

.. graphviz::

   digraph {

      {
      rank=source "create C main\nprogram my-prog.c"
      }
      {
      rank=sink "run it with ./my-prog"
      }


      edge [lblstyle="above, sloped"];
      "create C main\nprogram my-prog.c" ->
      "compile it with\ngcc my-prog.c -o my-prog"
      [label="single C file"];
      "compile it with\ngcc my-prog.c -o my-prog" ->
      "run it with ./my-prog" ->
      "make changes to my-prog.c" ->
      "compile it with\ngcc my-prog.c -o my-prog";
   }

And if you have multiple source files ``my-prog.c``, ``f1.c`` and
``f2.c``, they might look like this:

.. code-block:: c
   :caption: my-prog.c

   #include <stdio.h>

   extern int f1();
   extern double f2();
   int main()
   {
     printf("function f1() returns %d\n", f1());
     printf("function f2() returns %d\n", f2());
     return 0;
   }

.. code-block:: c
   :caption: f1.c

   /* f1() returns the number 42 */
   int f1()
   {
     return 42;
   }

.. code-block:: c
   :caption: f2.c

   #include <math.h>

   /* f2() returns the sin of PI/6 */
   double f2()
   {
     return sin(M_PI/6.0);   /* M_PI is defined in math.h */
   }

And the process for compiling and linking those multiple source files
might look like this:

.. graphviz::

   digraph {

      {
      rank=same "compile *each* file with\ngcc -c file.c"
                "recompile *just*\nthat .c file with\ngcc -c file.c"
      }
      {
      rank=source "create C main\nprogram my-prog.c\nand files f1.c, f2.c"
      }
      {
      rank=sink "run it with ./my-prog"
      }

      edge [lblstyle="above, sloped"];
      "create C main\nprogram my-prog.c\nand files f1.c, f2.c" ->
      "compile *each* file with\ngcc -c file.c"
      [label="multiple C files"];
      "compile *each* file with\ngcc -c file.c" ->
      "link files together with\ngcc -o my-prog my-prog.o f1.o f2.o -lm" ->
      "run it with ./my-prog" ->
      "make changes to a\nsingle .c file" ->
      "recompile *just*\nthat .c file with\ngcc -c file.c" ->
      "link files together with\ngcc -o my-prog my-prog.o f1.o f2.o -lm";
   }


What are libraries and header files?
------------------------------------

Terminology is important to understand what's happening when you
compile and link C (and C++) code.  Let's start by talking about
*libraries*.  When you start programming in C you are told to put this
line at the top of your program:

.. code-block:: c

   #include <stdio.h>

and you might have thought in a muddled manner (as I did at first)
"aha! that line links to the standard I/O library which gives me
functions like printf()!!"

This will get you through your initial learning process, but let's try
to make that narrative really precise.

A somewhat more precise narrative is:

   Using a library in C involves two parts: one is telling your code
   what functions are available and what arguments they take, so that
   you can call them properly.  The other is to link the library code
   with your code.

   The first part (compile-time) is achieved by *including a header
   file* in your source code, with instructions like ``#include
   <math.h>``.  The second part is achieved when you *link* your
   source code to the libraries you use.  In the command line

   .. code-block:: bash

      $ gcc myprog.c -lm

   the ``-lm`` portion means "link to the library in
   ``/usr/lib/libm.a``", which is the C math library.

So what happens if you forget the ``#include <math.h>`` or the
``-lm``?  Let's try it out.  Write a simple program:

.. code-block:: C

   #include <stdio.h>
   #include <math.h>

   int main()
   {
       double x = sin(M_PI/3);
       printf("%g\n", x);
       return 0;
   }

Comment out the ``#include <math.h>`` and try compiling with ``gcc
myprog.c -lm``.  You will get a warning to the effect that you have an
"implicit declaration of sin" and an error that ``M_PI`` is
undeclared.  That's because these things are defined in ``math.h``.

You can uncomment the ``#include <math.h>`` and compile again, but
this time leave out the ``-lm`` and compile with ``gcc myprog.c``.


C: scope and resolving external variables
=========================================
..

   [not yet written] This is another one of those topics that shifts you
   from a "beginner learning the syntax of a language" to being a "person
   who can understand and design larger programs".

   Programs grow and become complex, and much of the business of software
   engineering is coming up with ways to tame the complexity of large
   programs.

   The first of these steps dates back a very long time: compilation in
   separate files.  When you compile