Linux Basics
Introduction & History
One thing we need to understand is that Linux is not an operating system, it is just a kernel and it is the core used by other distributions and operating systems. This means that a kernel on its own is only as usable as a car engine without the surrounding car.
The main functions of the Linux kernel are to manage peripheral devices, handle communication with the processor and memory, schedule tasks to be executed, handle interrupt requests, and so on.
Linux is a Unix-like operating system kernel. Unix was a popular commercial product in the 1970s and 1980s, and it influenced the design of a number of other later systems. One of the systems that was influenced by Unix was called Minix. Minix was created by Andrew S. Tanenbaum for educational purposes. Tanenbaum made its complete source code available to universities for study in courses and research.
One Finnish university student, Linus Torvalds, had used Minix but wanted to deviate from its existing architecture. Torvalds decided to develop his own monolithic kernel (Minix had a microkernel architecture).
In a microkernel architecture, the kernel is broken down into separate processes, known as servers. Some of the servers run in kernel-space and some run in user-space. All servers are kept separate and run in different address spaces. Servers invoke “services” from each other by sending messages via Inter-Process Communication (IPC). This separation has the advantage that if one server fails, other servers can still work efficiently.
More: https://en.wikipedia.org/wiki/Microkernel
A monolithic kernel, on the other hand, is one large process running entirely in a single address space. It is a single, static, binary file. All kernel services exist and execute in the kernel address space and the kernel can invoke functions directly.
Around the same time that Torvalds was working on his monolithic kernel, the GNU project was preparing to offer a free Unix-like operating system that included a collection of free software programs like system utilities, text editors, and others.
GNU developers started to develop a kernel called Hurd, which was based on the microkernel design. When the Linux kernel became production-ready earlier than expected, the project decided to choose Linux as the kernel for the GNU operating system. The term GNU/Linux refers to the GNU operating system using the Linux kernel.
The integration of the kernel, the operating system, system utilities, and other software packages is called a distribution. When we use the term distribution, we are referring to a working system that can be installed, boots itself, and provides additional software.
If we were to search online for Linux operating systems/distributions, we would find many of them, each optimized for specific tasks. For example, one Linux distribution may be optimized for command line use, while another is best suited for a completely different application. There are graphical desktop versions of a Linux-based operating system that may not make sense to use on a server where someone only needs the command line to accomplish certain tasks.
Command Line
One commonly-used name to refer to the command line or the terminal is a shell . Technically speaking, a shell is a program that processes commands and returns output - but it is also colloquially used as a synonym for terminal or console.
There are a few important shells on Linux:
- sh: The Bourne SHell is the foundation for almost all other shell environments, since it holds the most important tasks, which have to do with command interpretation or act as a scripting language.
- Bash: Also known as Bourne-Again SHell, Bash was developed to serve as a replacement for bourne SHell by offering additional functionality and better syntax.
- ksh: This is another variation of a shell environment called Korn SHell, which again adds some functionality to the basic sh and Bash. For example, ksh handles the loop syntax better than Bash.
- zsh: The Z SHell is an extended Bourne SHell with additional improvements and functionality, which also builds on top of some of the Bash ones.
Usage of man
Many Linux programs have built in manuals, also known as man pages. We can use the man command to open up a man page in our terminal window. Generally, each man page will have a name, a synopsis, a description of the command’s purpose, and the corresponding options, parameters, or switches.
Man pages contain information about user commands, and also documentation regarding system administration commands, programming interfaces, and more. Manuals are categorized by several numbered sections.
Section | Contents |
---|---|
1 | User Commands |
2 | Programming interfaces for kernel system calls |
3 | Programming interfaces to the C library |
4 | Special files such as device nodes and drivers |
5 | File formats |
6 | Games and amusements such as screen-savers |
7 | Miscellaneous |
8 | System administration commands |
If we use the -k
option with man, we can perform a keyword search as shown below.
We can narrow the search with the help of a regular expression.
To searach for a string inside a manual open a manual (e.g.: man ls
), then type /your-search-string
and press Enter
Filesystem Hierarchy Standard (FHS)
- /bin/: basic programs
- /boot/: Linux kernel and other files required for its early boot process
- /dev/: device files
- /etc/: configuration files
- /home/: user’s personal files
- /lib/: basic libraries
- /media/: mount points for removable devices (CD/DVD-ROM, USB keys, and so on)
- /mnt/ or /mount/: temporary mount point
- /opt/: extra applications provided by third parties
- /root/: administrator’s (root’s) personal files
- /run/: volatile runtime data that does not persist across reboots (not yet included in the FHS)
- /sbin/: system programs
- /srv/: data used by servers hosted on this system
- /tmp/: temporary files (this directory is often emptied at boot)
- /usr/: applications (this directory is further subdivided into bin, sbin, lib according to the same logic as in the root directory)
- /usr/share/ contains architecture-independent data.
- /usr/local/ directory is meant to be used by the administrator for installing applications manually without overwriting files handled by the packaging system (dpkg).
- /var/: variable data handled by services. This includes log files, queues, spools, and caches.
- /proc/ and /sys/ are specific to the Linux kernel (and not part of the FHS). They are used by the kernel for exporting data to user space.
More: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard
Linux File Management
If we create a new symbolic link, or symlink with ln -s ~/original.txt symlink.txt
and set different file permissions on either the original file or the symlink, the mirrored file’s permissions will not change.
More: https://superuser.com/questions/303040/how-do-file-permissions-apply-to-symlinks
Finding Files
The three most common commands used to locate files in Kali Linux are find
, locate
and which
. These utilities are similar, but they work and return data in different ways, so they are useful in different circumstances.
The which
command searches through the directories that are defined in the $PATH
environment variable for a given file name. If a match is found, which returns the full path to the file.
The locate
command is the quickest way to find the location of files or directories in Kali. In order to provide a much shorter search time, locate searches a built-in database named locate.db rather than the entire hard disk itself. This database is automatically updated on a regular basis by an automated task.
The find
program enables us to walk a file hierarchy recursively in order to search for files and directories. It takes many arguments and the usage can be very complex. Some of its key options are:
-name
Search by filename or directory name (case sensitive).-iname
Search by filename or directory name (case insensitive).-type f/d/l/s
Search by type which can be (files, directories, links or sockets)-size
Search by file or directory size.-mtime
Search using the last modified date crateria.-o
Allows us to combine multiple values of the same argument.-user
Find files and directories based on their owner.
Redirection
It is important to understand that every program that runs on the command line in Linux-based systems automatically has three data streams connected to it. Each data stream is also assigned a file descriptor integer value:
- STDIN (0): This is the standard input on which data is fed into the program. Essentially, this is the part of the terminal accepting the text we type in.
- STDOUT (1): The standard output on the other hand is just how data is printed by the program, which defaults to the terminal.
- STDERR (2): Lastly, standard error is for error messages, which also gets printed to the terminal by default.
echo "Hello :)" > hello.txt
allows us to redirect and save the output of one program on the STDOUT stream to a file instead of the default behavior of printing it to the screen. If we redirect the output to a non-existent file, the file will be created automatically. Notice however, that if the file already exists, our redirect will replace the file’s content.
To append additional data to an existing file, we will use the >>
operator.
Where >
lets us write the output of a program as the input to a file, the <
operator allows us to read a file and use its output as the input to a program.
To redirect errors instead of a program’s output to a file, we use the 2>
operator.
Searching and Text Manipulation
grep
searches text files for a given regular expression and outputs any line containing a match to the standard output, which is usually the terminal screen.
Some of the most commonly used switches with grep include -r
for recursive searching in a directory, and -i
to ignore text case.
sed
is a powerful stream editor. At a very high level, sed
performs text editing on a stream of text, which will either be a set of specific files or standard output.
The cut
command is simple, but often comes in quite handy. It is used to extract a section of text from a line and write it to standard output. The most commonly used switches are -f
, for the field number we are cutting, and -d
for the field delimiter.
To see it in action, let’s echo a line of text and pipe it to cut. We’ll extract the second field using a comma (,) as the field delimiter.
awk
is a programming language designed for text processing and is typically used as a data extraction and reporting tool. It happens to be extremely powerful, and has significantly more functionalities than we can demonstrate here. Two commonly used features are the -F
switch, which is the field separator, and the print subcommand, which outputs the resulting text.
The comm
command compares two text files, displaying the lines that are unique to each one, as well as the lines they have in common. It outputs three columns:
- lines that are unique to the first file or argument
- lines that are unique to the second file or argument
- lines that are shared by both files
The -n
switch, where “n” is either 1, 2, or 3, can be used to suppress one or more columns, depending on the need.
The diff
command is used to detect differences between files, similar to comm
. However, diff
is much more complex and supports many output formats. Two of the most popular formats include the context format (-c
) and the unified format (-u
).
Users and Groups
/etc/passwd
: Information about user accounts
Example entry: john:x:1002:1002:John Doe,,,:/home/john:/bin/bash
A colon separates the different properties. john
is the username in plain text. x
indicates that the password needs to be pulled from the shadow file. As we mentioned previously, this is because the passwd file is world readable, meaning that any user can read its content. The shadow file can only be accessed with high privileges. Continuing, the first 1002
indicates the User ID (UID), which is a unique number on the system for each account, and the second 1002
is the primary Group ID (GID) the user belongs to respectively. Additional group memberships are defined in the /etc/group
file. John Doe
is in an optional field called the comment field. It is most commonly used for informational purposes. Usually, it contains the user’s full name. /home/john
is the user’s home directory location, and /bin/bash
is the default shell environment for the user.
It is important to note that the UID of
0
has a special role. It is always assigned to the system administrator superuser, calledroot
. It is technically possible to manually set UID0
for other users and thereby grant them elevated privileges, but it is not recommended.
/etc/shadow
: Fingerprints of the passwords
Example entry: root:$6$pfiZTzNB1wav3OFG$GDwbvI44D7sBuX7Q.6LmNWx.RaU6nzxZWCCkkMNIXCkvANnNoYogV983NSLkG1cfpaW4mmyFuTOKkDf53hVkh/:18781:0:99999:7:::
Again, each part of this entry is separated by a colon. The root entry is the username in plain text. The next piece, which is quite long, represents an encrypted password. The next piece, 18781, is the last time the password was changed, in timestamp format. 0 is The minimum number of days required between password changes, and 99999 is the maximum number of days the password is valid for. The last number, 7, indicates the number of days in advance of the password’s expiration date that the user will be warned that they will need to change their password.
Disable user accounts
As system administrators, one method we can use to control user accounts is to lock a user’s password. The usermod -L username
and passwd -l username
commands both place an exclamation mark (!
) at the beginning of the password hash in /etc/shadow
. This change can be manually applied to the file as well. The result is that any password authentication attempt will fail for the given user.
Another method is to mark the user account as expired. When an account expiration date is set, it is stored in the 8th field within /etc/shadow
. We can use the chage
command by providing the -E
switch to set an expiration date for a user account. The easiest way to expire an account is to provide a date in the past.
A third method is to change the default shell in /etc/passwd
either to /bin/false
, which will exit immediately, or to /sbin/nologin
, which is a simple program that displays a message saying that the account is currently not available. We can use the usermod
command with the -s
option to change the default shell of a user.
If we would like to know whether a user account is disabled or locked, we have to verify all three methods mentioned above. We can use the following commands to check for expiration dates, password-locks, and non-interactive shells.
/etc/group
: Information about user groups
Example entry: bluetooth:x:117:kali
bluetooth
is the group name, x
is the group password (usually not used), and 117
is the group ID. kali
is a particular user that belongs to the specified group.
Note that only users who have a secondary group membership are listed in
/etc/group
, since primary group memberships are stored in/etc/passwd
.
Superuser-Do
We execute the id
command four times to check our UID.
- The first time, we note that our kali user’s UID is 1000, which is often the default UID of the first human user on a Linux system.
- We then use
sudo id
to execute a single command as root. Since we are executing the command as root,id
outputs root’s UID. - However, this UID does not belong to kali, and so when we execute
id
as kali the third time, we get kali’s UID of 1000 once again. - Finally, we execute
sudo -i
to give the kali user root’s login shell. Now when we runid
for the fourth time, we are provided with root’s UID.
Notice also that the user’s prompt has changed from a
$
character to a#
. This convention identifies an elevated user on many Linux shells, including Bash and Zsh.
We can use su
to execute a single command as the target user by using the -l
and -c
options as follows.
File Permissions
Each file or directory has specific permissions for three categories of users.
- Its owner (symbolized by u, as in user)
- Its owner group (symbolized by g, as in group), representing all the members of the group
- The others (symbolized by o, as in other)
In addition to this, there are three types of rights that can be combined.
- reading (symbolized by r, as in read)
- writing (or modifying, symbolized by w, as in write)
- executing (symbolized by x, as in eXecute)
In the case of a file, these rights are easily understood: read access allows a user to read the content (including copying), write access allows changing it, and execute access allows running it (which will only work if it is a program).
A directory is handled differently from a file. Read access gives the right to consult the list of its contents (files and directories). Write access allows creating or deleting files. Finally, execute access allows crossing through the directory to access its contents (for example, with the cd command). Being able to cross through a directory without being able to read it gives the user permission to access the entries therein that are known by name, but not to find them without knowing their exact name.
The symbolic representation involves the letter symbols mentioned above. We can define rights for each category of users (u/g/o), by setting them explicitly (with =), by adding (with +), or subtracting (with -).
For example, we can use the u=rwx,g+rw,o-r formula to give the owner read, write, and execute rights, add read and write rights for the owner group, and remove read rights for other users.
The second way to represent rights is via an octal numeric representation. It associates each right with a value.
- 4 for read
- 2 for write
- 1 for execute
We associate each combination of rights with the sum of the three figures.
- 7 = 4 + 2 + 1 = read, write, and execute
- 6 = 4 + 2 = read and write
- 5 = 4 + 1 = read and execute
- 3 = 2 + 1 = write and execute
Finally, 0 represents no permissions.
Notice how there is only one way to obtain each of the combination numbers by adding together the individual components.
To set rights for each of the three different categories, we assign one of these numeric values to them in the usual order (owner, then group, then others).
For instance, the chmod 754 <file>
command will set the following rights: read, write, and execute for the owner (since 7 = 4 + 2 + 1), read and execute for the group (since 5 = 4 + 1), and read-only for others.
Setuid, setgid, and the Sticky Bit
Aside from the rwx permissions described above, there are two additional special rights that pertain to executable files: setuid and setgid. These are symbolized with the letter “s”.
If these two rights are set, either an uppercase or lowercase “s” will appear in the permissions. This allows the current user to execute the file with the rights of the owner (setuid) or the owner’s group (setgid).
If the setuid attribute is assigned to an executable, that program will run under the super-user identity. This means that any user who manages to subvert a setuid root program to call a command of their choice can effectively impersonate the root user and have all rights on the system. Penetration testers regularly search for these types of files when they gain access to a system as a way of escalating their privileges.
The lowercase “s”, which appears here, means both execute and setuid flags are set. A capital “S” would mean the setuid bit is set, but that the execute flag is missing.
Example:
First, we made a copy of the id executable named idcopy, and put it in the /usr/bin directory. This allows us to invoke idcopy directly from the command line, since /usr/bin is inside our $PATH. We reviewed the permissions, which showed two things. We note that the “s” is missing. In addition, the owner of this file is kali.
Perhaps predictably, when we ran idcopy, the output showed that the user executing the command has UID 1000, which belongs to kali.
For our experiment to work, we will need to change the owner to root and then change the permissions so that this command runs with the permissions of the owner of the executable rather than the current user. We used sudo chown root:kali /usr/bin/idcopy to change the owner of this file to root.
The critical step here was setting the setuid bit with sudo chmod u+s /usr/bin/idcopy. When we reviewed the permissions again, we note that the “s” is present and that the owner of this file is now root.
Finally, we ran /usr/bin/idcopy and noted that while the UID of the executing user remains 1000 (belonging to kali), the effective UID, or EUID, is now 0 (belonging to root). This means that the program ran as if root was the executor, even though it was invoked by kali.
The sticky bit (symbolized by the letter “t”) is a permission that is only useful in directories. It is commonly used for temporary directories where everybody has write access (such as /tmp/). It restricts deletion of files so that only their owner or the owner of the parent directory can delete them. Without this, everyone could delete each other’s files in /tmp/.
More: How to Find Files With setuid Permissions
Linux Processes
The quickest way to background a process is to append an ampersand (&
) to the end of the command to send it to the background immediately after it starts. Let’s try a brief example.
If we had not supplied the &
symbol, the command would have run in the foreground, and we would be forced to either cancel the command with Ctrl+c, wait until the command finishes to regain control of the terminal, or suspend the job using Ctrl+z after it has already started.
Suspending a job pauses it until it is told to resume. Once a job has been suspended, we can resume it in the background by using the bg
command:
The built-in jobs
utility lists the jobs that are running in the current terminal session, and fg
returns a job to the foreground. These commands are shown in action below:
One of the most useful commands to monitor processes on mostly any Unix-like operating system is
ps
(short for process status). Unlike thejobs
command,ps
lists processes system-wide, not only for the current terminal session. This utility is considered a standard on Unix-like OSes and its name is so well-recognized that even on Windows PowerShell, ps is a predefined command alias for the Get-Process cmdlet, which essentially serves the same purpose.
File and Command Monitoring
The most common use of tail
is to monitor log file entries as they are being written. For example, we may want to monitor the Apache logs to determine if a web server is being contacted by a given client we are attempting to attack via a client-side exploit. Let’s examine this practical example in order to understand how we might use tail once we’ve mastered it.
The -f
option (follow) is very useful as it continuously updates the output as the target file grows. Another convenient switch is -nX, which outputs the last “X” number of lines, instead of the default value of 10.
The watch
command is used to run a designated command at regular intervals. By default, it runs every 2 seconds but we can specify a different interval by using the -n X option to have it run every “X” number of seconds. For example, this command will list logged-in users (via the w
command) once every 5 seconds.
Scheduled Tasks
Scheduled tasks are are listed under the /etc/cron.*
directories, where ”*” represents the frequency the task will run on. For example, tasks that will be run daily can be found under the /etc/cron.daily directory. Each script is listed in its own subdirectory.
It is worth noting that system administrators often add their own scheduled tasks in the /etc/crontab
file. These tasks should be inspected carefully for insecure file permissions as most jobs in this particular file will run as root.
Logs
Most Unix-like systems, as well as services running on them, produce logs within the /var/log
directory.
We can retrieve the kernel logs with the dmesg
command.
systemd
also stores multiple logs (stdout/stderr output of services, syslog messages, kernel logs) and makes it easy to query them with journalctl
.
Disk Management
The free
command displays information on memory. We can use either the -m
or -g
options, to display the data in mebibytes or in gibibytes, respectively.
df
(which stands for “disk free”) reports on the available disk space on each of the disks mounted in the file system. Its -h
option (for human readable) converts the sizes into a more legible unit - usually mebibytes or gibibytes.
There are a few other commands and options we can use as well.
dd3
is mostly used to raw copy a device file on a block level.du4
can be used to determine the size of files and directories. The-hs
option is typically used to make the output more human readable.df
and its-T
option can be used to show the type of the filesystem.
One of the key differences between Linux and other OSs, is that in Linux we have to mount a filesystem before we can use it. Since Linux systems have a single directory tree, if we were to insert a USB drive (for example), we would need to create an associated location somewhere in that tree. Creating that associated location is called mounting.
The mount
command can be used to display the currently mounted filesystems and their types. It can also be used to mount partitions or image disk files to a mount point.
fdisk
can be used to gain information about inserted devices (like USB drives)
Relevant Note(s): Information Technology Windows Basics