1. Motivation and plan
Section author: Mark Galassi <mark@galassi.org>
I cannot imagine a career more wonderful than that of a scientist.
The day-to-day work in science today involves using computers at all times. Scientists who master their computers and can program them with agility will enjoy the job the most, and are often in great demand: they can carry out unique new research. Young aspiring scientists who are not told this are being misled.
With an excellent group of young students, I have developed a series of lessons on scientific computing, aimed at kids who have already taken my “Serious Programming For Kids” course [Gal15]. I have two goals with these lessons:
introduce the tools and tricks for scientific computing, and
take a tour of diverse scientific problems that demonstrate “realy interesting” things you can do with some programming knowledge.
These mini-courses teach scientific computing using Python on the GNU/Linux operating system. There are other possible choices of programming language and operating system, and some of them are adequate, but there are specific reasons for which I chose Python and GNU/Linux. Some are given in the “Serious Programming for Kids” teacher’s manual, but here are some other reasons which are specific to scientific work:
Scientific software often matures into sophisticated programs which need to be executed on production computers and in a reproducible manner. For this the use of a free/open-source operating system and language interpreter are crucial.
Much scientific infrastructure is available as an integral part of the GNU/Linux distributions. For example, on a current Debian GNU/Linux or Ubuntu or Fedora distribution you will find the GNU Scientific Library, astropy, scipy, a remarkable number of R science packages. These packages are “just there” as part of the operating system. This comes in part from the fact that the GNU/Linux operating system is developed by hackers for hackers: programming is a seamless part of such systems.
Python spread rapidly soon after its initial development. Thanks to some key early developers coming from physics, astronomy and biology research groups, it was rapidly adopted by the scientific community. The result is that a vast collection of high quality scientific libraries are available in Python.
Many research projects have very long lives, and the software is used for years after it is first written. My opinion, and that of many who observe the world of scientific computing, is that programs written in Python on a GNU/Linux system will still run many years from now [1]
Reproducibility again: using proprietary software in scientific research makes it impossible to reproduce or verify a result: there is undisclosed code being executed!
Reproducibility and verifiability also dictate that scientific software should be able to run in batch mode, rather than through a graphical user interface (GUI). A GUI is not necessarily a bad thing, but after initial exploration of data with a GUI, the scientist needs to then generate a batch program to reproduce her results.
1.1. Notes for teachers
This is a teacher’s manual for the mini courses. In the 10-hour “Serious Programming for Yough” workshop which introduces Python from scratch, I teach at a blackboard (or whiteboard nowadays).
This course is quite different: it is for students who have already taken the 10-hour workshop, and already have a laptop ready and running a GNU/Linux distribution.
The format is 1.5 hours, and I lecture with a projector or large TV screen, working on examples in emacs or in the command line.
While I lecture I have the students load the HTML version of this book, usually from a web site to which I sync this book – at this time I use http://markgalassi.bitbucket.io/ – this allows them to paste in code samples if they are too long to type.
I usually project a couple of terminals (one for python snippets, one for shell commands), a browser window with the relevant chapter of this book, and the emacs editor. This allows the students to see how I work on the examples.
The lecturing style should be one of quickly getting a juicy example up on their screens: something that gives visible results for the students. Then step back a bit to make sure they understood how we got to it, and then quickly on to the next example.
Once they have worked some juicy examples, it’s time to lean back and have a broader discussion of the meaning of certain things, and to discuss the insight we got from an example. You can lace this with your favorite lecture on historical and philosophical aspects of what’s in this chapter, but you should then quickly pivot back to more work. This “get back to work and roll up your sleeves” is a crucial part of what we do.
Understanding this material is hard work for the students: I have developed this course to include serious material they might otherwise not learn until college, so I often ask the students to “suspend their not understanding” [2] and just latch on to one or two things they can remember. For example I introduce Fourier Analysis in Section 45.6, and when I give that lecture I frequently repeat “remember: it is OK to not understand most of this, but repeat after me the one thing I want you to understand: all these signals look like wild jumbles, but they are made up of simple waves which let us understand part of their musical nature.”
In broad strokes you can think of two main categories of scientific computing effort: analyzing data from experiments, and simulating your own physical situation with a computer program that generates fake (but, we hope, realistic) experimental data. We will look at both of these types, and introduce the words: experiments and simulation as we go through the examples.
The way in which kids approach computers today (clicking and touching) allows them to not understand some concepts which are very important for scientific programming (and in fact any kind of programming). Because of this we must first get comformtable with the following concepts:
What is a data file.
How to plot a data file.
How to write a program which takes a data file, does some processing of the data, and writes out another file with the processed data.
Once we have these skills we can:
Tell the story of that plot.
Generate simulated data.
Retrieve data from online sources.
Record data from an experiment.
Analyze data to go beyond that initial story.
1.2. Acknowledgements
Thanks to Laura Fortunato and David Palmer for discussing this curriculum with me in detail before I developed it. Thanks to Jonathan Haack who has assisted me in teaching these courses and has given me feedback.
Thanks to my excellent Santa Fe students Lucas Blakeslee, Althea Foster, Alex Odom, Neha Sunkara, Rosa Birkner-Glidden, Miles Teng-Levy, Rowan Nadon, Teagan Boyes-Wetzel, Oisin O’Connell, Abby Wilson, Juan de la Riva, who have taken the course regularly and helped me develop it.
Most of all thanks to students and co-authors Leina Gries, Sophia Mulholland, Joaquin Bas for close collaboration on the book and for writing parts of it.
1.3. Status of the book
Some chapters are largely complete and just need polishing and proofreading; some have just a title; some are partially written.
Until the status is a bit more uniform, I will be putting a “readiness” status note at the top of the chapter. If you do not see such a status note then the chapter is probably not complete!
There is also an appendix on proposed chapters: Section 45.