Resources
Unix/Linux Command Reference
File Commands
ls – directory listing
ls -al – formatted listing with hidden files
cd dir – change directory to dir
cd – change to home
cd - – change back to previous directory
pwd – show current directory
mkdir dir – create a directory dir
rm file – delete file
rm -r dir – delete directory dir
rm -f file – force remove file
rm -rf dir – force remove directory dir *
cp file1 file2 – copy file1 to file2
cp -r dir1 dir2 – copy dir1 to dir2; create dir2 if it
doesn’t exist
mv file1 file2 – rename or move file1 to file2
if file2 is an existing directory, moves file1 into
directory file2
ln -s file link – create symbolic link link to file
touch file – create or update file
cat > file – places standard input into file
more file – output the contents of file
head file – output the first 10 lines of file
tail file – output the last 10 lines of file
tail -f file – output the contents of file as it
grows, starting with the last 10 lines
Process Management
ps – display your currently active processes
top – display all running processes
kill pid – kill process id pid
killall proc – kill all processes named proc *
jobs – lists stopped or background jobs
bg – resume a stopped job in the background
fg – brings the most recent job to foreground
fg n – brings job n to the foreground
File Permissions
chmod octal file – change the permissions of file to octal, which can be found separately for user, group, and world by adding:
- 4 – read (r)
- 2 – write (w)
- 1 – execute (x)
Examples:
chmod 777 – read, write, execute for all
chmod 755 – rwx for owner, rx for group and world
For more options, see man chmod.
SSH
ssh [-p port] user@host – connect to host as user,
optionally on custom port port
scp [-P port] user@host:path1 path2 – copy the
remote file at path1 to local location path2
ssh-copy-id user@host – add your key to host for
user to enable a keyed or passwordless login
Searching
grep pattern files – search for pattern in files
grep -r pattern dir – search for pattern in dir
command | grep pattern – search for pattern in the
output of command
locate file – find all instances of file\
System Info
date – show the current date and time
cal – show this month’s calendar
uptime – show current uptime
w – display who is online
whoami – who you are logged in as
finger user – display information about user
uname -a – show kernel information
cat /proc/cpuinfo – cpu information
cat /proc/meminfo – memory information
man command – show the manual for command
df – show disk usage
du – show directory space usage
free – show memory and swap usage
whereis app – show possible locations of app
which app – show which app will be run by default
type app – show which app or builtin will be run
Compression
tar cf file.tar files – create a tar named
file.tar containing files
tar xf file.tar – extract the files from file.tar
tar czf file.tar.gz files – create a tar with
Gzip compression
tar xzf file.tar.gz – extract a tar using Gzip
tar cjf file.tar.bz2 – create a tar with Bzip2
compression
tar xjf file.tar.bz2 – extract a tar using Bzip2
gzip file – compresses file and renames it to
file.gz
gzip -d file.gz – expands file.gz back to file
Network
ping host – ping host and output results
whois domain – get whois information for domain
dig domain – get DNS information for domain
dig -x host – reverse lookup host
wget file – download file
wget -c file – continue a stopped download
Installation
Install from source:
./configure
make
make install
dpkg -i pkg.deb – install a package (Debian)
rpm -Uvh pkg.rpm – install a package (RPM)
Shortcuts
Ctrl+C – halts the current command
Ctrl+Z – stops the current command, resume with
fg in the foreground or bg in the background
Ctrl+D – log out of current session, similar to exit
Ctrl+W – erases one word in the current line
Ctrl+U – erases the whole line
Ctrl+R – type to bring up a recent command
!! - repeats the last command
exit – log out of current session
Lecture 1: Course Overview
hello.c, Source program (text) -> Preprocessor(cpp) -> hello.i, Modified source program (text) -> Compiler(cc1) -> hello.s, Assembly program (text) -> Assembler(as) -> printf.o hello.o, Relocatable object programs (binary) -> Linker(ld) -> hello, Executable object program (binary)
A file is a sequence of bytes, nothing more and nothing less. Every I/O device, including disks, keyboards, displays, and even networks, is modeled as a file.
The operating system kernel serves as an intermediary between the application and the hardware. It provides three fundamental abstractions: (1) Files are abstractions for I/O devices. (2) Virtual memory is an abstraction for both main memory and disks. (3) Processes are abstractions for the processor, main memory, and I/O devices.
linux> gcc -m32 prog.c then this program will run correctly on either a 32-bit or a 64-bit machine. On the other hand, a program compiled with the directive
linux> gcc -m64 prog.c
will only run on a 64-bit machine.
0x01234567
Big endian
0x100 0x101 0x102 0x103
01 23 45 67
Little endian 0x100 0x101 0x102 0x103
67 45 23 01
Machine-Level Representation of Programs
linux> gcc -Og -S mstore.c This will cause gcc to run the compiler, generating an assembly file mstore.s, and go no further.
If we use the -c command-line option, gcc will both compile and assemble the code linux> gcc -Og -c mstore.c This will generate an object-code file mstore.o that is in binary format and hence cannot be viewed directly.
Lecture 13: Linking
Linking:
Programs are translated and linked using a compiler driver:
linux> gcc -Og -o prog main.c sum.c
linux> ./prog
Translators (cpp, cc1, as)
Space Efficiency:
Option 1: Static Linking
- Executable files and running memory images contain only the library code they actually use
Option 2: Dynamic linking
- Executable files contain no library code
- During execution, single copy of library code can be shared across all executing processes
What Do Linkers Do?
Step 1: Symbol resolution
-
Programs define and reference symbols (global variables and functions):
- Symbol definitions are stored in object file (by assembler) in symbol table.
- Symbol table is an array of entries
- Each entry includes name, size, and location of symbol.
- During symbol resolution step, the linker associates each symbol reference with exactly one symbol definition..
What Do Linkers Do? (cont’d)
Step 2: Relocation
- Merges separate code and data sections into single sections
- Relocates symbols from their relative locations in the .o files to their final absolute memory locations in the executable.
- Updates all references to these symbols to reflect their new positions.
Three Kinds of Object Files (Modules)
- Relocatable object file (.o file)
- Contains code and data in a form that can be combined with other
relocatable object files to form executable object file.
- Each .o file is produced from exactly one source (.c) file
- Contains code and data in a form that can be combined with other
relocatable object files to form executable object file.
- Executable object file (a.out file)
- Contains code and data in a form that can be copied directly into memory and then executed.
- Shared object file (.so file)
- Special type of relocatable object file that can be loaded into memory and linked dynamically, at either load time or run-time.
- Called Dynamic Link Libraries (DLLs) by Windows
Executable and Linkable Format (ELF)
-
Standard binary format for object files
- One unified format for
- Relocatable object files (.o),
- Executable object files (a.out)
- Shared object files (.so)
- Generic name: ELF binaries
ELF Object File Format
- Elf header
- Word size, byte ordering, file type (.o, exec, .so), machine type, etc.
- Segment header table
- Page size, virtual address memory segments (sections), segment sizes.
- .text section
- Code
- .rodata section
- Read only data: jump tables, string constants, …
- .data section
- Initialized global variables
- .bss section
- Uninitialized global variables
- “Block Started by Symbol”
- “Better Save Space”
- Has section header but occupies no space
- .symtab section
- Symbol table
- Procedure and static variable names
- Section names and locations
- .rel.text section
- Relocation info for .text section
- Addresses of instructions that will need to be modified in the executable
- Instructions for modifying
- .rel.data section
- Relocation info for .data section
- Addresses of pointer data that will need to be modified in the merged executable
- .debug section
- Info for symbolic debugging (gcc -g)
- Section header table
- Offsets and sizes of each section
Linker Symbols
- Global symbols
- Symbols defined by module m that can be referenced by other modules.
- e.g., non-static C functions and non-static global variables.
- External symbols
- Global symbols that are referenced by module m but defined by some other module.
- Local symbols
- Symbols that are defined and referenced exclusively by module m.
- e.g, C functions and global variables defined with the static attribute.
- Local linker symbols are not local program variables
Local Symbols
- Local non-static C variables vs. local static C variables
- Local non-static C variables: stored on the stack
- Local static C variables: stored in either .bss or .data
static int x = 15;
int f() {
static int x = 17;
return x++;
}
int g() {
static int x = 19;
return x += 14;
}
int h() {
return x += 27;
}
Compiler allocates space in .data for each definition of x
Creates local symbols in the symbol table with unique names, e.g., x, x.1721 and x.1724.
How Linker Resolves Duplicate Symbol Definitions
- Program symbols are either strong or weak
- Strong: procedures and initialized globals
- Weak: uninitialized globals
- Or ones declared with specifier extern
Linker’s Symbol Rules
- Rule 1: Multiple strong symbols are not allowed
- Each item can be defined only once
- Otherwise: Linker error
- Rule 2: Given a strong symbol and multiple weak symbols,
choose the strong symbol
- References to the weak symbol resolve to the strong symbol
- Rule 3: If there are multiple weak symbols, pick an arbitrary
one
- Can override this with gcc –fno-common
Important: Linker does not do type checking.
Global Variables
- Avoid if you can
- Otherwise
- Use static if you can
- Initialize if you define a global variable
- Use extern if you reference an external global variable
- Treated as weak symbol
- But also causes linker error if not defined in some file
Loading Executable Object Files
Look at addresses returned by malloc: actually two heaps: have separate allocation algorithms for large objects and small objects.
Libraries: Packaging a Set of Functions
Old-Fashioned Solution: Static Libraries
Static libraries (.a archive files)
- Concatenate related relocatable object files into a single file with an index (called an archive).
- Enhance linker so that it tries to resolve unresolved external references by looking for the symbols in one or more archives.
- If an archive member file resolves reference, link it into the executable.
Commonly Used Libraries
libc.a (the C standard library)
- 4.6 MB archive of 1496 object files.
- I/O, memory allocation, signal handling, string handling, data and time, random numbers, integer math
libm.a (the C math library)
- 2 MB archive of 444 object files.
- floating point math (sin, cos, tan, log, exp, sqrt, …)
Using Static Libraries
- Linker’s algorithm for resolving external references:
- Scan .o files and .a files in the command line order.
- During the scan, keep a list of the current unresolved references.
- As each new .o or .a file, obj, is encountered, try to resolve each unresolved reference in the list against the symbols defined in obj.
- If any entries in the unresolved list at end of scan, then error.
- Problem:
- Command line order matters!
- Moral: put libraries at the end of the command line.
unix> gcc -static -o prog2c -L. -lvector main2.o
main2.o: In function `main':
main2.c:(.text+0x19): undefined reference to `addvec'
collect2: error: ld returned 1 exit status
Modern Solution: Shared Libraries
- Static libraries have the following disadvantages:
- Duplication in the stored executables (every function needs libc)
- Duplication in the running executables
- Minor bug fixes of system libraries require each application to explicitly relink
- Rebuild everything with glibc?
- https://security.googleblog.com/2016/02/cve-2015-7547-glibcgetaddrinfo-stack.html
- Modern solution: shared libraries
- Object files that contain code and data that are loaded and linked into an application dynamically, at either load-time or run-time
- Also called: dynamic link libraries, DLLs, .so files
- Dynamic linking can occur when executable is first loaded and run (load-time linking)
- Common case for Linux, handled automatically by the dynamic linker (ld-linux.so)
- Standard C library (libc.so) usually dynamically linked
- Dynamic linking can also occur after program has begun (run-time linking)
- In Linux, this is done by calls to the dlopen() interface
- Distributing software
- High-performance web servers
- Runtime library interpositioning
- In Linux, this is done by calls to the dlopen() interface
- Shared library routines can be shared by multiple processes
- More on this when we learn about virtual memory
Where are the libraries found?
- Use “ldd” to find out:
unix> ldd prog
linux-vdso.so.1 => (0x00007ffcf2998000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f99ad927000)
/lib64/ld-linux-x86-64.so.2 (0x00007f99adcef000)
Dynamic Linking at Run-time
#include <stdio.h>
#include <stdlib.h>
#include <dlfcn.h>
int x[2] = {1, 2};
int y[2] = {3, 4};
int z[2];
int main(int argc, char** argv)
{
void *handle;
void (*addvec)(int *, int *, int *, int);
char *error;
/* Dynamically load the shared library that contains addvec() */
handle = dlopen("./libvector.so", RTLD_LAZY);
if (!handle) {
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
. . .
/* Get a pointer to the addvec() function we just loaded */
addvec = dlsym(handle, "addvec");
if ((error = dlerror()) != NULL) {
fprintf(stderr, "%s\n", error);
exit(1);
}
/* Now we can call addvec() just like any other function */
addvec(x, y, z, 2);
printf("z = [%d %d]\n", z[0], z[1]);
/* Unload the shared library */
if (dlclose(handle) < 0) {
fprintf(stderr, "%s\n", dlerror());
exit(1);
}
return 0;
}
Library interpositioning
powerful linking technique that allows programmers to intercept calls to arbitrary functions
Interpositioning can occur at:
- Compile time: When the source code is compiled
- Link time: When the relocatable object files are statically linked to form an executable object file
- Load/run time: When an executable object file is loaded into memory, dynamically linked, and then executed.
Some Interpositioning Applications
- Security
- Confinement (sandboxing)
- Behind the scenes encryption
- Debugging
- In 2014, two Facebook engineers debugged a treacherous 1-year old bug in their iPhone app using interpositioning
- Code in the SPDY networking stack was writing to the wrong location
- Solved by intercepting calls to Posix write functions (write, writev, pwrite)
Source: Facebook engineering blog post at: https://code.facebook.com/posts/313033472212144/debugging-file-corruption-on-ios/
- Monitoring and Profiling
- Count number of calls to functions
- Characterize call sites and arguments to functions
- Malloc tracing
- Detecting memory leaks
- Generating address traces
- Error Checking
- C Programming Lab used customized versions of malloc/free to do careful error checking
- Other labs (malloc, shell, proxy) also use interpositioning to enhance checking capabilities
- The “-Wl” flag passes argument to linker, replacing each comma with a space.
- The “–wrap,malloc ” arg instructs linker to resolve
references in a special way:
- Refs to malloc should be resolved as __wrap_malloc
- Refs to __real_malloc should be resolved as malloc
The LD_PRELOAD environment variable tells the dynamic linker to resolve unresolved refs (e.g., to malloc)by looking in mymalloc.so first.
Lecture 22 Network Programming
A Client-Server Transaction
Most network applications are based on the client-server model:
Note: clients and servers are processes running on hosts (can be the same or different hosts)
Global IP Internet (upper case)
- Based on the TCP/IP protocol family
- IP (Internet Protocol)
- Provides basic naming scheme and unreliable delivery capability of packets (datagrams) from host-to-host
- UDP (Unreliable Datagram Protocol)
- Uses IP to provide unreliable datagram delivery from process-to-process
- TCP (Transmission Control Protocol)
- Uses IP to provide reliable byte streams from process-to-process over connections
- IP (Internet Protocol)
- Accessed via a mix of Unix file I/O and functions from the sockets interface
/* Internet address structure */
struct in_addr {
uint32_t s_addr; /* network byte order (big-endian) */
};
hostname -i 127.0.1.1 hostname omnisky nslookup localhost Server: 8.8.8.8 Address: 8.8.8.8#53
** server can’t find localhost: NXDOMAIN Use getaddrinfo and getnameinfo functions (described later) to convert between IP addresses and dotted decimal format.
Domain Naming System (DNS)
The Internet maintains a mapping between IP addresses and domain names in a huge worldwide distributed database called DNS
Conceptually, programmers can view the DNS database as a collection of millions of host entries. * In a mathematical sense, a host entry is an equivalence class of domain names and IP addresses.
Can explore properties of DNS mappings using nslookup
Each host has a locally defined domain name localhost which always maps to the loopback address 127.0.0.1
Internet Connections
- Clients and servers communicate by sending streams of bytes
over connections. Each connection is:
- Point-to-point: connects a pair of processes.
- Full-duplex: data can flow in both directions at the same time,
- Reliable: stream of bytes sent by the source is eventually received by the destination in the same order it was sent.
- A socket is an endpoint of a connection
- Socket address is an IPaddress:port pair
- A port is a 16-bit integer that identifies a process:
- Ephemeral port: Assigned automatically by client kernel when client makes a connection request.
- Well-known port: Associated with some service provided by a server (e.g., port 80 is associated with Web servers)
Mappings between well-known ports and service names is contained in the file /etc/services on each Linux machine.
Sockets Interface
Set of system-level functions used in conjunction with Unix I/O to build network applications.
Created in the early 80’s as part of the original Berkeley distribution of Unix that contained an early version of the Internet protocols.
Available on all modern systems(Unix variants, Windows, OS X, IOS, Android, ARM)
Remember: All Unix I/O devices, including networks, are modeled as files