Introduction to Computer Systems (ICS)

朱琪 · December 19, 2020

Homepage

Resources

Unix/Linux Command Reference

File Commands

ls – directory listing
ls -al – formatted listing with hidden files
cd dir – change directory to dir
cd – change to home
cd - – change back to previous directory
pwd – show current directory
mkdir dir – create a directory dir
rm file – delete file
rm -r dir – delete directory dir
rm -f file – force remove file
rm -rf dir – force remove directory dir *
cp file1 file2 – copy file1 to file2
cp -r dir1 dir2 – copy dir1 to dir2; create dir2 if it doesn’t exist
mv file1 file2 – rename or move file1 to file2
if file2 is an existing directory, moves file1 into directory file2
ln -s file link – create symbolic link link to file
touch file – create or update file
cat > file – places standard input into file
more file – output the contents of file
head file – output the first 10 lines of file
tail file – output the last 10 lines of file
tail -f file – output the contents of file as it grows, starting with the last 10 lines

Process Management

ps – display your currently active processes
top – display all running processes
kill pid – kill process id pid
killall proc – kill all processes named proc *
jobs – lists stopped or background jobs
bg – resume a stopped job in the background
fg – brings the most recent job to foreground
fg n – brings job n to the foreground

File Permissions

chmod octal file – change the permissions of file to octal, which can be found separately for user, group, and world by adding:

  • 4 – read (r)
  • 2 – write (w)
  • 1 – execute (x)

Examples:
chmod 777 – read, write, execute for all
chmod 755 – rwx for owner, rx for group and world
For more options, see man chmod.

SSH

ssh [-p port] user@host – connect to host as user, optionally on custom port port
scp [-P port] user@host:path1 path2 – copy the remote file at path1 to local location path2
ssh-copy-id user@host – add your key to host for user to enable a keyed or passwordless login

Searching

grep pattern files – search for pattern in files
grep -r pattern dir – search for pattern in dir
command | grep pattern – search for pattern in the output of command
locate file – find all instances of file\

System Info

date – show the current date and time
cal – show this month’s calendar
uptime – show current uptime
w – display who is online
whoami – who you are logged in as
finger user – display information about user
uname -a – show kernel information
cat /proc/cpuinfo – cpu information
cat /proc/meminfo – memory information
man command – show the manual for command
df – show disk usage
du – show directory space usage
free – show memory and swap usage
whereis app – show possible locations of app
which app – show which app will be run by default
type app – show which app or builtin will be run

Compression

tar cf file.tar files – create a tar named file.tar containing files
tar xf file.tar – extract the files from file.tar
tar czf file.tar.gz files – create a tar with Gzip compression
tar xzf file.tar.gz – extract a tar using Gzip
tar cjf file.tar.bz2 – create a tar with Bzip2 compression
tar xjf file.tar.bz2 – extract a tar using Bzip2
gzip file – compresses file and renames it to file.gz
gzip -d file.gz – expands file.gz back to file

Network

ping host – ping host and output results
whois domain – get whois information for domain
dig domain – get DNS information for domain
dig -x host – reverse lookup host
wget file – download file
wget -c file – continue a stopped download

Installation

Install from source:
./configure
make
make install
dpkg -i pkg.deb – install a package (Debian)
rpm -Uvh pkg.rpm – install a package (RPM)

Shortcuts

Ctrl+C – halts the current command
Ctrl+Z – stops the current command, resume with
fg in the foreground or bg in the background
Ctrl+D – log out of current session, similar to exit
Ctrl+W – erases one word in the current line
Ctrl+U – erases the whole line
Ctrl+R – type to bring up a recent command
!! - repeats the last command
exit – log out of current session

Lecture 1: Course Overview

hello.c, Source program (text) -> Preprocessor(cpp) -> hello.i, Modified source program (text) -> Compiler(cc1) -> hello.s, Assembly program (text) -> Assembler(as) -> printf.o hello.o, Relocatable object programs (binary) -> Linker(ld) -> hello, Executable object program (binary)

A file is a sequence of bytes, nothing more and nothing less. Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file.

The operating system kernel serves as an intermediary between the application and the hardware. It provides three fundamental abstractions: (1) Files are abstractions for I/O devices. (2) Virtual memory is an abstraction for both main memory and disks. (3) Processes are abstractions for the processor, main memory, and I/O devices.

linux> gcc -m32 prog.c then this program will run correctly on either a 32-bit or a 64-bit machine. On the other hand, a program compiled with the directive

linux> gcc -m64 prog.c

will only run on a 64-bit machine.

0x01234567

Big endian

0x100 0x101 0x102 0x103

01 23 45 67

Little endian 0x100 0x101 0x102 0x103

67 45 23 01

Machine-Level Representation of Programs

linux> gcc -Og -S mstore.c This will cause gcc to run the compiler, generating an assembly file mstore.s, and go no further.

If we use the -c command-line option, gcc will both compile and assemble the code linux> gcc -Og -c mstore.c This will generate an object-code file mstore.o that is in binary format and hence cannot be viewed directly.

Lecture 13: Linking

Linking:

Programs are translated and linked using a compiler driver:

linux> gcc -Og -o prog main.c sum.c

linux> ./prog

Translators (cpp, cc1, as)

Space Efficiency:

Option 1: Static Linking

  • Executable files and running memory images contain only the library code they actually use

Option 2: Dynamic linking

  • Executable files contain no library code
  • During execution, single copy of library code can be shared across all executing processes

What Do Linkers Do?

Step 1: Symbol resolution

  • Programs define and reference symbols (global variables and functions):

  • Symbol definitions are stored in object file (by assembler) in symbol table.
    • Symbol table is an array of entries
    • Each entry includes name, size, and location of symbol.
  • During symbol resolution step, the linker associates each symbol reference with exactly one symbol definition..

What Do Linkers Do? (cont’d)

Step 2: Relocation

  • Merges separate code and data sections into single sections
  • Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable.
  • Updates all references to these symbols to reflect their new positions.

Three Kinds of Object Files (Modules)

  • Relocatable object file (.o file)
    • Contains code and data in a form that can be combined with other relocatable object files to form executable object file.
      • Each .o file is produced from exactly one source (.c) file
  • Executable object file (a.out file)
    • Contains code and data in a form that can be copied directly into memory and then executed.
  • Shared object file (.so file)
    • Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time.
    • Called Dynamic Link Libraries (DLLs) by Windows

Executable and Linkable Format (ELF)

  • Standard binary format for object files

  • One unified format for
    • Relocatable object files (.o),
    • Executable object files (a.out)
    • Shared object files (.so)
  • Generic name: ELF binaries

ELF Object File Format

  • Elf header
    • Word size, byte ordering, file type (.o, exec, .so), machine type, etc.
  • Segment header table
    • Page size, virtual address memory segments (sections), segment sizes.
  • .text section
    • Code
  • .rodata section
    • Read only data: jump tables, string constants, …
  • .data section
    • Initialized global variables
  • .bss section
    • Uninitialized global variables
    • “Block Started by Symbol”
    • “Better Save Space”
    • Has section header but occupies no space

ELF Header

  • .symtab section
    • Symbol table
    • Procedure and static variable names
    • Section names and locations
  • .rel.text section
    • Relocation info for .text section
    • Addresses of instructions that will need to be modified in the executable
    • Instructions for modifying
  • .rel.data section
    • Relocation info for .data section
    • Addresses of pointer data that will need to be modified in the merged executable
  • .debug section
    • Info for symbolic debugging (gcc -g)
  • Section header table
    • Offsets and sizes of each section

Linker Symbols

  • Global symbols
    • Symbols defined by module m that can be referenced by other modules.
    • e.g., non-static C functions and non-static global variables.
  • External symbols
    • Global symbols that are referenced by module m but defined by some other module.
  • Local symbols
    • Symbols that are defined and referenced exclusively by module m.
    • e.g, C functions and global variables defined with the static attribute.
    • Local linker symbols are not local program variables

Local Symbols

  • Local non-static C variables vs. local static C variables
    • Local non-static C variables: stored on the stack
    • Local static C variables: stored in either .bss or .data
static int x = 15;
int f() {
static int x = 17;
return x++;
}
int g() {
static int x = 19;
return x += 14;
}
int h() {
return x += 27;
}

Compiler allocates space in .data for each definition of x

Creates local symbols in the symbol table with unique names, e.g., x, x.1721 and x.1724.

How Linker Resolves Duplicate Symbol Definitions

  • Program symbols are either strong or weak
    • Strong: procedures and initialized globals
    • Weak: uninitialized globals
      • Or ones declared with specifier extern

Linker’s Symbol Rules

  • Rule 1: Multiple strong symbols are not allowed
    • Each item can be defined only once
    • Otherwise: Linker error
  • Rule 2: Given a strong symbol and multiple weak symbols, choose the strong symbol
    • References to the weak symbol resolve to the strong symbol
  • Rule 3: If there are multiple weak symbols, pick an arbitrary one
    • Can override this with gcc –fno-common

Important: Linker does not do type checking.

Global Variables

  • Avoid if you can
  • Otherwise
    • Use static if you can
    • Initialize if you define a global variable
    • Use extern if you reference an external global variable
      • Treated as weak symbol
      • But also causes linker error if not defined in some file

Loading Executable Object Files

ELF Header

Look at addresses returned by malloc: actually two heaps: have separate allocation algorithms for large objects and small objects.

Libraries: Packaging a Set of Functions

Old-Fashioned Solution: Static Libraries

Static libraries (.a archive files)

  • Concatenate related relocatable object files into a single file with an index (called an archive).
  • Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives.
  • If an archive member file resolves reference, link it into the executable.

ELF Header

Commonly Used Libraries

libc.a (the C standard library)

  • 4.6 MB archive of 1496 object files.
  • I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math

libm.a (the C math library)

  • 2 MB archive of 444 object files.
  • floating point math (sin, cos, tan, log, exp, sqrt, …)

Using Static Libraries

  • Linker’s algorithm for resolving external references:
    • Scan .o files and .a files in the command line order.
    • During the scan, keep a list of the current unresolved references.
    • As each new .o or .a file, obj, is encountered, try to resolve each unresolved reference in the list against the symbols defined in obj.
    • If any entries in the unresolved list at end of scan, then error.
  • Problem:
    • Command line order matters!
    • Moral: put libraries at the end of the command line.
unix> gcc -static -o prog2c -L. -lvector main2.o
main2.o: In function `main':
main2.c:(.text+0x19): undefined reference to `addvec'
collect2: error: ld returned 1 exit status

Modern Solution: Shared Libraries

  • Static libraries have the following disadvantages:
    • Duplication in the stored executables (every function needs libc)
    • Duplication in the running executables
    • Minor bug fixes of system libraries require each application to explicitly relink
      • Rebuild everything with glibc?
      • https://security.googleblog.com/2016/02/cve-2015-7547-glibcgetaddrinfo-stack.html
  • Modern solution: shared libraries
    • Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time
    • Also called: dynamic link libraries, DLLs, .so files
  • Dynamic linking can occur when executable is first loaded and run (load-time linking)
    • Common case for Linux, handled automatically by the dynamic linker (ld-linux.so)
    • Standard C library (libc.so) usually dynamically linked
  • Dynamic linking can also occur after program has begun (run-time linking)
    • In Linux, this is done by calls to the dlopen() interface
      • Distributing software
      • High-performance web servers
      • Runtime library interpositioning
  • Shared library routines can be shared by multiple processes
    • More on this when we learn about virtual memory

Where are the libraries found?

  • Use “ldd” to find out:
unix> ldd prog
linux-vdso.so.1 => (0x00007ffcf2998000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f99ad927000)
/lib64/ld-linux-x86-64.so.2 (0x00007f99adcef000)

Dynamic Linking at Load Time

Dynamic Linking at Run-time

#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int x[2] = {1, 2};
int y[2] = {3, 4};
int z[2];
int main(int argc, char** argv)
{
    void *handle;
    void (*addvec)(int *, int *, int *, int);
    char *error;

    /* Dynamically load the shared library that contains addvec() */
    handle = dlopen("./libvector.so", RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "%s\n", dlerror());
        exit(1);
    }
    . . .

    /* Get a pointer to the addvec() function we just loaded */
    addvec = dlsym(handle, "addvec");
    if ((error = dlerror()) != NULL) {
        fprintf(stderr, "%s\n", error);
        exit(1);
    }

    /* Now we can call addvec() just like any other function */
    addvec(x, y, z, 2);
    printf("z = [%d %d]\n", z[0], z[1]);

    /* Unload the shared library */
    if (dlclose(handle) < 0) {
        fprintf(stderr, "%s\n", dlerror());
        exit(1);
    }
    return 0;
}

Library interpositioning

powerful linking technique that allows programmers to intercept calls to arbitrary functions

Interpositioning can occur at:

  • Compile time: When the source code is compiled
  • Link time: When the relocatable object files are statically linked to form an executable object file
  • Load/run time: When an executable object file is loaded into memory, dynamically linked, and then executed.

Some Interpositioning Applications

  • Security
    • Confinement (sandboxing)
    • Behind the scenes encryption
  • Debugging
    • In 2014, two Facebook engineers debugged a treacherous 1-year old bug in their iPhone app using interpositioning
    • Code in the SPDY networking stack was writing to the wrong location
    • Solved by intercepting calls to Posix write functions (write, writev, pwrite)

Source: Facebook engineering blog post at: https://code.facebook.com/posts/313033472212144/debugging-file-corruption-on-ios/

  • Monitoring and Profiling
    • Count number of calls to functions
    • Characterize call sites and arguments to functions
    • Malloc tracing
      • Detecting memory leaks
      • Generating address traces
  • Error Checking
    • C Programming Lab used customized versions of malloc/free to do careful error checking
    • Other labs (malloc, shell, proxy) also use interpositioning to enhance checking capabilities
  • The “-Wl” flag passes argument to linker, replacing each comma with a space.
  • The “–wrap,malloc ” arg instructs linker to resolve references in a special way:
    • Refs to malloc should be resolved as __wrap_malloc
    • Refs to __real_malloc should be resolved as malloc

The LD_PRELOAD environment variable tells the dynamic linker to resolve unresolved refs (e.g., to malloc)by looking in mymalloc.so first.

Lecture 22 Network Programming

A Client-Server Transaction

Most network applications are based on the client-server model:

Note: clients and servers are processes running on hosts (can be the same or different hosts)

Global IP Internet (upper case)

  • Based on the TCP/IP protocol family
    • IP (Internet Protocol)
      • Provides basic naming scheme and unreliable delivery capability of packets (datagrams) from host-to-host
    • UDP (Unreliable Datagram Protocol)
      • Uses IP to provide unreliable datagram delivery from process-to-process
    • TCP (Transmission Control Protocol)
      • Uses IP to provide reliable byte streams from process-to-process over connections
  • Accessed via a mix of Unix file I/O and functions from the sockets interface

Global IP Internet

/* Internet address structure */
struct in_addr {
uint32_t s_addr; /* network byte order (big-endian) */
};

hostname -i 127.0.1.1 hostname omnisky nslookup localhost Server: 8.8.8.8 Address: 8.8.8.8#53

** server can’t find localhost: NXDOMAIN Use getaddrinfo and getnameinfo functions (described later) to convert between IP addresses and dotted decimal format.

Domain Naming System (DNS)

The Internet maintains a mapping between IP addresses and domain names in a huge worldwide distributed database called DNS

Conceptually, programmers can view the DNS database as a collection of millions of host entries. * In a mathematical sense, a host entry is an equivalence class of domain names and IP addresses.

Can explore properties of DNS mappings using nslookup

Each host has a locally defined domain name localhost which always maps to the loopback address 127.0.0.1

Internet Connections

  • Clients and servers communicate by sending streams of bytes over connections. Each connection is:
    • Point-to-point: connects a pair of processes.
    • Full-duplex: data can flow in both directions at the same time,
    • Reliable: stream of bytes sent by the source is eventually received by the destination in the same order it was sent.
  • A socket is an endpoint of a connection
    • Socket address is an IPaddress:port pair
  • A port is a 16-bit integer that identifies a process:
    • Ephemeral port: Assigned automatically by client kernel when client makes a connection request.
    • Well-known port: Associated with some service provided by a server (e.g., port 80 is associated with Web servers)

Mappings between well-known ports and service names is contained in the file /etc/services on each Linux machine.

Sockets Interface

Set of system-level functions used in conjunction with Unix I/O to build network applications.

Created in the early 80’s as part of the original Berkeley distribution of Unix that contained an early version of the Internet protocols.

Available on all modern systems(Unix variants, Windows, OS X, IOS, Android, ARM)

Remember: All Unix I/O devices, including networks, are modeled as files

Twitter, Facebook