9. Probing memory on a running Linux system
9.1. Motivation and plan
The lesson tries to “cleverly” throw in side-by-side examples of Python and C programming to also give some exposure to that. Python programming brings great advantages in life, and C programming even deeper ones.
I will give a very small conceptual explanation of memory, and then see if the lesson here can turn that into a clear understanding.
Much of this might apply to other UNIX systems - it’s an interesting exercise to try it on a *BSD system.
Before starting we will want to have these programs installed:
sudo apt install gcc procps gnuplot
9.2. Concepts
- physical RAM
Fast, plentiful but not as plentiful as disk space, volatile (does not survive power-off).
- disk space
Slow, very plentiful (and, since you can just go buy more or use network-mounted space, it’s virtually infinite).
- dynamic memory allocation
A program starts out with some fixed amount of memory allocated to it for what the “run time support” already knows it will need. The program can then ask for more memory while it runs. It can also release that memory. The mechanisms are different for different languages, but at the low level it usually involves calling the
malloc()
call to request the memory from the operating system.- virtual memory
If a program asks for more memory than the computer has in RAM then what happens? This is where virtual memory kicks in. The current program will be given the memory. To allow this a chunk (called a “page”) of some other program’s memory (or this program’s memory!) will be saved off to the swap area on the hard disk. This is called “swapping out a page”.
Once that page is saved to disk, it’s RAM is free to be used by the process that needs it.
So what happens when a program needs to use that “swapped out” memory again? It gets “swapped in”, returning to memory.
The virtual memory management keeps track of every “page” of memory that has been “swapped out” and is ready to “swap it back in”. It does so in such a way that the application program never needs to see it happen – it just requests and releases memory.
- thrashing
If virtual memory is used too much the computer can end paging crazily, constantly swapping pages of RAM to disk and bringing others back in from RAM. This can make a computer grind to a halt because what should be very fast RAM access has to wait for a bunch of disk activity. It is important to understand that kinds of performance problem in a computer as it starts paging too much.
- garbage collection
In high level languages (formerly called “very high level languages”, VHLLs) when you create big lists and other objects, you are given the memory automatically. When you don’t use an object anymore the languages run time infrastructure will free up that memory. This is called garbage collection.
9.3. Simplest programs to look at memory concepts
This does not give us much insight into what is actually being done with that memory.
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#include <string.h>
#include <time.h>
/* compile with: */
/* gcc simple-malloc.c -o simple-malloc */
int main(int argc, char *argv[])
{
if (argc != 3) {
printf("dude, wrong args\n");
printf("usage: %s size_MB duration_sec\n", argv[0]);
exit(1);
}
unsigned long size = atol(argv[1]); /* in MB */
int duration = atoi(argv[2]); /* in sec */
printf("requesting %ld bytes for %d seconds\n", size*1024*1024, duration);
char *ptr = malloc(size * 1024 * 1024);
// note that we are not checking the result of the allocation
memset(ptr, 'x', size*1024*1024);
ptr[104] = 'y';
ptr[107] = '\0';
printf("%s\n", ptr);
sleep(duration);
free(ptr); /* we're done with it */
}
9.4. Using memory incorrectly
/* compile with "gcc -g -fno-stack-protector mem-trash.c -o mem-trash", run with "./mem-trash" */
#include <stdio.h>
#include <string.h>
#define A_CONSTANT 3.7
const float a_real_language_constant = 4.1;
int do_the_work(); /* prototype */
int main()
{
int ret = do_the_work();
return ret;
}
int do_the_work()
{
char my_string[9];
int important_array[8];
int crucial_value;
int i;
crucial_value = 42;
printf("just set crucial_value to: %d\n", crucial_value);
strcpy(my_string, "this is a string that is longer than what I have allocated for it");
printf("Just set my_string to be <%s>\n", my_string);
printf("After setting my_string, crucial_value is: %d\n", crucial_value);
for (i = 0; i < 8; ++i) {
important_array[i] = i*i; /* fill this important array with the squares of numbers */
}
printf("After setting the array, my_string is <%s>\n", my_string);
return 0;
}
9.5. Lesson: preparing the programs
9.5.1. Programs we will use
top (from debian package procps)
vmstat (package procps)
gnuplot (package gnuplot-x11 or gnuplot5-qt or gnuplot5-x11)
C compiler (package gcc)
Python interpreter (package python3 or python)
9.5.2. Writing programs to use and release memory
There are two accompanying programs: memory-use-py.py and memory-use-c.c which use a given amount of memory for a given amount of time. In Python it’s done by allocating a single huge string; in C by calling malloc() and then memset().
#! /usr/bin/env python
import sys
## import gc # to explicitly free memory
from time import sleep
import os
def main():
print('# arguments should be size to use (MB), second is duration (sec)')
print('# argv: ', sys.argv, len(sys.argv))
if len(sys.argv) != 3:
print(' error in arguments: usage is:')
print(' %s size(MB) time(sec)' % sys.argv[0])
print(' example: "%s 500 40" will use 500 megabytes for 40 seconds'
% sys.argv[0])
sys.exit(1)
size = int(sys.argv[1])
duration = int(sys.argv[2])
## request big chunk of RAM, then write to each byte
foo = ('x')*(size*1024*1024)
#foo = ['abc' for x in range(10**7)]
print('%d MB allocated; now sleeping for %d seconds' % (size, duration))
os.system('date --iso=seconds')
sleep(duration)
os.system('date --iso=seconds')
print('DONE; exiting')
if __name__ == '__main__':
main()
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <unistd.h>
#include <string.h>
/* compile with: */
/* gcc memory-use-c.c -o memory-use-c */
int main(int argc, char *argv[])
{
if (argc != 3) {
printf(" error in arguments: usage is:\n");
printf(" %s size(MB) time(sec)\n", argv[0]);
printf(" example: \"%s 500 40\" will use 500 megabytes for 40 seconds\n",
argv[0]);
exit(1);
}
int size = atoi(argv[1]);
int duration = atoi(argv[2]);
/* NOTE: we are not doing any checking on the arguments to see that
their values are valid numbers for size and duration; a good
exercise is to improve the program for such checking, possibly
with assert()
*/
char *ptr = malloc(size * 1024 * 1024);
/* NOTE: once we have allocated the memory, we should write
something to it because a modern memory system might not actually
grab the RAM until you write to it */
memset(ptr, 'x', size*1024*1024);
printf("%d MB allocated; now sleeping for %d seconds\n", size, duration);
system("date --iso=seconds");
sleep(duration);
system("date --iso=seconds");
printf("DONE; exiting\n");
return 0;
}
Prepare to run them with:
chmod 755 memory-use-py.py
gcc memory-use-c.c -o memory-use-c
Then try running the programs for brief runs
./memory-use-py.py 100 10
./memory-use-c 100 10
9.6. Lesson: real time monitoring of memory with top
Start at least three terminals.
In one termianl type “top” and then type “M” so that the processes are sorted by memory use.
In top focus on the top area where it says “KiB Mem:” and “KiB Swap:”, as well as the top of the individual process section, where the highest RAM processes are used. Look at the “VIRT” and “RES” columns.
We will need to run pretty big memory takers to rise above the bloat of the web browser and other programs! So in the second terminal run:
./memory-use-py.py 3000 40
In the third terminal run:
./memory-use-c 3000 40
Watch how they evolve in “top”. Did they rise near the top of the memory use?
And did you come close to using up all physical RAM? That’s what the “KiB Mem:” and “KiB Swap:” lines can tell you.
Now see if you can make your system thrash.
9.7. Long term monitoring of memory with vmstat
run:
vmstat -t 1 | tee vmstat.out
The
-t
option puts a date and time stamp at the end of each line.Note
If you are running this on a different UNIX-like system, like FreeBSD or MacOS, the memory monitoring command might be a bit different. If
vmstat
does not run, you should try runningvm_stat
instead. The column which shows how much virtual memory is in use might be different, so you will need to change the plotting instruction below to plot a different column of numbers..do a bunch of stuff with the memory using programs; make it all quite different and make them last a while
when you are done type “control-C” in the “vmstat” program
run:
$ gnuplot gnuplot> plot 'vmstat.out' using 4 with lines gnuplot> ## and another plot: gnuplot> set multi layout 2,1 gnuplot> plot 'vmstat.out' using 4 with lines gnuplot> plot 'vmstat.out' using 7 with lines
IMPROVEME: here’s another snippet which I have not yet written up properly:
gnuplot> set xdata time
gnuplot> set timefmt "%Y-%m-%d %H:%M:%S"
gnuplot> do for [t=0:50000] {
more> plot 'vmstat.out' using 18:4 with lines
more> pause 2
more> }
You can also couple it with
pcstat -t 1 | tee pcstat.out
and similar plotting stuff, although the time format is different.
9.8. Advanced memory monitoring with valgrind and massif
valgrind --tool=massif memory-use-c
ls -lt massif.out # from here glean the filename
massif-visualizer massif.out.PID
9.9. Further resources
Joyce Levine proposes these videos:
https://www.youtube.com/watch?v=XV5sRaSVtXQ also this video was helpful for me https://www.youtube.com/watch?v=9wydl0VFmeQ&list=PLEDF53DC200BAF48D&index=2 might be helpful for pointers, might just be confusing though idk https://www.youtube.com/watch?v=a25FQoBhng8&list=PLEDF53DC200BAF48D&index=3