Overview
C:\>attrib c:\*.* A SH C:\DumpStack.log A SH C:\DumpStack.log.tmp A SH I C:\hiberfil.sys A SH C:\pagefile.sys A SH C:\swapfile.sys
Copying a 2 GB file
(2,147,483,648) SSDCopying ~500 MB of files
Linux kernel 3.2.34
(~40,000 files) SSDCopying a 16 GB file
using cp (NVMe)Copying a 16 GB file
using rsync (NVMe) Time MB/s --------------------- xfs 0:19 110 btrfs 0:19 110 hfs+ 0:19 110 ext4 0:21 95 ext2 0:22 95 jfs 0:24 87 FAT32 0:25 84 reiserfs 0:25 84 ext3 0:27 78 NTFS 0:40 53 zfs 0:47 45 Time MB/s --------------------- btrfs 0:05 79 ext4 0:07 58 hfs+ 0:12 35 FAT32 0:14 30 reiserfs 0:14 30 jfs 0:18 25 zfs 0:24 18 ext3 0:25 18 ext2 0:25 17 xfs 0:27 15 NTFS 2:23 3 Time MB/s ------------------------ xfs 0:10.1 1700 btrfs 0:10.6 1616 hfs+ 0:13.2 1302 ext4 0:14.9 1153 ext3 0:16.9 1014 jfs 0:17.3 996 ext2 0:17.4 987 reiserfs 0:21.5 799 NTFS 1:35.1 181 FAT32 File is too big Time MB/s ----------------------- xfs 0:51.1 336 hfs+ 0:52,0 331 btrfs 0:52.9 325 ext2 0:54.3 317 jfs 0:54.8 313 ext4 0:56.0 307 ext3 0:56.2 306 reiserfs 0:59.2 290 NTFS 2:26.5 117 FAT32 File is too big
########################################################################################################################### # Filesystem Time (secs) Total bytes Used Available In use (gparted) # #----------------------------------------------------------------------------------------------------------------------- # # btrfs 0.452 1,000,203,091,968 17,301,504 998,024,937,472 16.50 MiB 17,301,504 # # hfs+ 2.214 1,000,203,091,968 104,984,576 1,000,098,107,392 100.12 MiB 104,983,429 # # ntfs 2.544 1,000,203,087,872 98,095,104 1,000,104,992,768 93.55 MiB 98,094,284 # # f2fs 2.779 1,000,202,043,392 49,994,014,720 947,992,387,584 Unknown Unknown # # jfs 5.756 1,000,038,141,952 122,331,136 999,915,810,816 273.97 MiB 287,278,366 # # ext4 9.814 984,373,075,968 75,124,736 934,271,021,056 14.81 GiB 15,902,116,414 # # xfs 11.418 999,714,713,600 35,028,992 999,679,684,608 499.16 MiB 523,407,196 # # fat32 18.326 999,958,937,600 32,768 999,958,904,832 232.88 MiB 244,192,378 # # reiserfs 79.393 1,000,172,560,384 33,628,160 1,000,138,932,224 61.19 MiB 64,162,365 # # ext3 413.841 984,373,075,968 75,259,904 934,287,663,104 14.81 GiB 15,902,116,414 # # ext2 416.520 984,507,293,696 75,124,736 934,422,016,000 14.69 GiB 15,773,267,394 # ###########################################################################################################################
"Truths" (T) Data (D) T: Most files are small D: Roughly 2K is the most common size T: The average file size is growing D: Almost 200K is the average size T: Most bytes are stored in large files D: A few big files use the most space T: File systems contain lots of files D: Almost 100K on average T: File systems are roughly half full D: Even as disks grow, file systems remain 50% full T: Directories are typically small D: Many have few entries; most have 20 or fewer
Virtual File System Layer
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
The User could have simply called printf: printf → fprintf → puts → fwrite → etc...
Various filesystems: FAT, FAT32, NTFS, APFS, ext4, XFS, Btrfs, ZFS, etc.
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
A Brief Unix/Linux/macOS Example
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
The relationship between directory entries, inodes, and data blocks:Note: The multiple levels of indirection shown above is also how B-Trees (a tree-like data structure for very large data sets) work. Many filesystems are implemented using B-Trees or similar data structures that use extents for more efficiency.
You can think of the directory entries as the Table of Contents of the file system. This is how the filesystem "looks up" the file by name and then follows the pointer (12345 in the example) to get to the metadata (inode), which leads to the data blocks.
The size of a pointer and the size of the disk blocks (either blocks of pointers or blocks of data, as they are the same) determines the maximum size of the disk (filesystem) as well as the maximum size of a file. Given this information and the sizes below, answer the question:
* This value depends on how many inodes the filesystem has and is sometimes determined when the filesystem is created.
Pointer size Block size Max filesystem size* Max file size 4 bytes 2,048 (2K) Depends ??? 4 bytes 4,096 (4K) Depends ??? 8 bytes 4,096 (4K) Depends ??? 8 bytes 8,192 (8K) Depends ???
Every file in the system has a number of pointers to its data blocks. Find what the maximum number of pointers is (for a file) and then multiply that by the size of a disk block. That gives the size of the largest file. For a simplistic example, if you had a maximum of 1,000 pointers to data blocks, and each data block was 4,096 (4K) bytes, then the largest file would be:
It's just simple1,000 x 4,096 = 4,096,000 bytes
Self-check - Given all of this information, answer this question: "What is the maximum number of files that a filesystem can hold?" The answer is not simply a number, it's an explanation. Think of it like this: "How many files of zero length can the filesystem hold?" That will give you the answer. (Hint: It's not unlimited or infinite!)
Bonus: What is the command in Linux that will tell you this information?
Self-check - With multiple levels of indirection, filesystems can be implemented efficiently for fragmented files. However, for non-fragmented (i.e. contiguous files), this approach is not very efficient. Explain why that is and how a better method can be used.
Self-check - For very small files (just a few bytes), there is a lot of overhead necessary to keep track of it using this scheme. Can you think of a simple optimization that could reduce the overhead for files that are very small, say, less than 100 bytes? Many systems have many very small files and we call them symbolic links or shortcuts.
For reference, this is somewhat related to how using a doubly-linked list to keep track of a single character causes a lot of overhead. Essentially, with 8-byte pointers, each node in the list would require 24 bytes just to hold the single character (plus 2 pointers and padding/alignment) That's essentially 96% overhead!
A simple filesystem implementation.
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
Operating System Concepts - 8th Edition Silberschatz, Galvin, Gagne ©2009
You can think of a directory as kind of a table of contents or index. In fact, that's kind of the definition of a directory. You've probably seen these directories inside of buildings. You walk in the front door, and the first thing you see (on a wall or sign) is a directory of people/companies in the building. This directory lists the name of the person/company and then a room number (maybe with a floor number, as well). It allows you to quickly and easily find the person you are looking for. The room could be a small closet (small file) or a 10,000 sq. ft. office (large file).
Name inode Type count 1345 d find 656323 d hex 87343 d reorder 856 d
struct dirent { ino_t d_ino; /* Inode number */ off_t d_off; /* Not an offset; see below */ unsigned short d_reclen; /* Length of this record */ unsigned char d_type; /* type of file; */ char d_name[256]; /* NUL-terminated filename */ };
#include <stdio.h> /* printf */
#include <dirent.h> /* opendir, readdir, closedir */
#include <stdlib.h> /* exit */
/* For human-readable names */
char *ENT_TYPE[] = {" UNK", "FIFO", "CHAR", "", " DIR", "",
" BLK", "", " REG", "", " LNK", "", "SOCK"};
int main(int argc, char **argv)
{
char *dirname = "."; /* Default directory to process */
DIR *dir; /* Like a FILE *, but for directories */
struct dirent *ent; /* A directory entry is a file */
int count = 0; /* Number of files processed */
/* Optional directory to process, defaults to cwd */
if (argc > 1)
dirname = argv[1];
/* Open the directory */
dir = opendir(dirname);
if (!dir)
{
perror(dirname);
exit(1);
}
/* Read each directory entry (file) and print out some info */
while (1)
{
/* Get next file */
ent = readdir(dir);
if (!ent)
break;
printf("%4i: inode: %12lu, offset: %20lu, len: %2hu, type: (%2i) %s, name: %s\n",
++count,
ent->d_ino == (unsigned long)-1 ? 0 : ent->d_ino, /* inode number */
ent->d_off, /* for internal use */
ent->d_reclen, /* length of this record */
ent->d_type, /* file type (numeric) */
ENT_TYPE[ent->d_type], /* file type (text) */
ent->d_name /* NUL-terminated filename */
);
}
/* Done */
closedir(dir);
return 0;
}
Most of the information is usually not relevant to application programs. The filename
and type are useful. Once you have that information, you can call the
stat function to retrieve all of the other information (e.g.
file size, permissions, date/time stamps, etc.)
Case Study: ext2/ext3/ext4 Filesystem
The first filesystem developed specifically for Linux was the ext filesystem or extended filesystem, which was based on the Unix filesystem (a.k.a the Berkley Fast File System or FFS). Then, the ext2 filesystem enhanced ext further with more features from the FFS.Next came the ext3 filesystem which added more improvements, especially journaling. After that came the ext4 filesystem, which added several more improvements, most notably, extents. Because the data structures (for the most part) have been compatible between the three filesystems (and we aren't interested in the other features yet), talking about ext4 will be very similar to discussing the structure of ext2/ext3 systems.
The ext4 filesystem is a very stable and mature filesystem used by many Linux distributions. It's not the best (if there exists a "best" filesystem) or fastest or the most feature-rich filesystem, but it's fairly efficient and fairly straight-forward to understand and implement (if you're an operating systems implementer). Many more powerful/complex filesystems have similar attributes of ext4. By understanding the basics of this filesystem, you'll be more likely to understand how other file systems work and what they have done to improve upon ext4.
So, with that said, let's see just how much work the filesystem must do in order
to simply display the contents of a simple text file.
We'll use this reference system for the demonstration:
chico@nina ~ $ ls -l /
total 258,048
drwxr-xr-x 2 root root 4,096 Apr 9 2019 bin
drwxr-xr-x 3 root root 4,096 Apr 9 2019 boot
drwxr-xr-x 2 root root 4,096 Aug 23 2015 cdrom
drwxr-xr-x 17 root root 4,640 Oct 1 11:56 dev
drwxr-xr-x 213 root root 12,288 Oct 8 13:17 etc
drwxr-xr-x 10 root root 4,096 Oct 8 13:20 home
drwxr-xr-x 8 root root 4,096 Oct 8 13:20 homes
drwxr-xr-x 27 root root 4,096 Apr 16 2019 lib
drwxr-xr-x 2 root root 4,096 Apr 9 2019 lib32
drwxr-xr-x 2 root root 4,096 Apr 9 2019 lib64
drwxr-xr-x 2 root root 4,096 Apr 9 2019 libx32
drwxr-xr-x 2 root root 16,384 Feb 18 2017 lost+found
drwxr-xr-x 6 root root 4,096 Jul 8 2018 media
[several more lines removed . . .]
chico@nina ~ $
chico@nina ~ $ ls -l /homes total 24,576 drwxr-xr-x 2 alvin alvin 4,096 Oct 8 13:20 alvin drwxr-xr-x 2 betty betty 4,096 Oct 8 13:20 betty drwxr-xr-x 8 chico chico 4,096 Oct 8 13:20 chico drwxr-xr-x 2 fred fred 4,096 Oct 8 13:20 fred drwxr-xr-x 2 veronica veronica 4,096 Oct 8 13:20 veronica drwxr-xr-x 2 wilma wilma 4,096 Oct 8 13:20 wilma chico@nina ~ $
Let's see what's in chico's directory using the tree command:Note: On a typical Linux system, a user's home directory is in the /home (singular) directory. However, for this example (and for technical reasons), I've created some "artificial" users in /homes (plural) which will make the details a little easier to explain and understand. Just keep that in mind if you're trying to find a /homes directory on your system as it's unlikely to exist.
chico@nina ~ $ tree /homes/chico /homes/chico ├── bathroom ├── bedroom ├── garage └── kitchen ├── cupboards ├── microwave ├── oven ├── refrigerator │ ├── apples │ ├── butter │ ├── cake │ ├── cheese │ ├── chicken │ ├── coke │ ├── eggs │ ├── juice │ ├── milk │ └── pie ├── sink └── stove 10 directories, 10 files chico@nina ~ $
The file were interested in is cake. The full path to cake is:and the command that we will use to display the contents:/homes/chico/kitchen/refrigerator/cake
and the output:cat /homes/chico/kitchen/refrigerator/cake
Which is presumably all of the things that are in the cake! (Don't knock it until you've tried it!)eggs butter milk flour vanilla icing strawberries peaches lettuce asparagus
Note: I'm attempting to create an analogy/metaphor here. In chico's home (directory) there is a kitchen (directory), and in the kitchen there is a refrigerator (directory) and in the refrigerator there is a cake (file) that contains ingredients (lines of text).
So, the question is, "How many disk reads are required to locate (search), open (read), and display the file?" To answer that question, this is how we proceed.
is going to require significantly less work than locating this file:/usr/hostname
The hostname file above only requires searching the root directory (/) and the usr directory. The file.txt requires searching the root directory and 13 other directories before getting to file.txt! That's a lot of work that must be done everytime you access that file. Fortunately for the users, it's all hidden behind the filesystem./usr/share/icons/foo/bar/baz/bat/one/more/dir/and/were/done/file.txt
The ext4 filesystem accomplishes this work using inodes and data blocks that were described above. Let's go through this step-by-step to see exactly what is going on. I'm going to use real data from one of my systems to show this process.
As you can imagine, there are a bunch of tools on a Linux system that will help us peer into the filesystems data structures (inodes) and disk blocks. The first and simplest command is our trusty ls command. If you run ls -ld / (on the root directory), it will display something like this:
If we add -i to the command, it will also show us the inode that contains the information about the root directory:drwxr-xr-x 29 root root 4,096 Oct 9 13:00 /
Output:ls -ldi /
This tells us that the root directory's inode is inode #2. By the way, the -d option tells ls to just show information about the directory itself, not the contents of the directory. Removing the option will show this output.2 drwxr-xr-x 29 root root 4,096 Oct 9 13:00 /
Another way we could have found the inode is with the statstat command:
Output:stat /
There's a lot of other information displayed as well, but for now, we're just concerned with the inode. (The IO Block: 4096 is also important as it tells us how big each logical disk block is.)File: '/' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 801h/2049d Inode: 2 Links: 29 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2019-05-18 12:13:53.741950988 -0700 Modify: 2020-10-09 13:00:01.918312997 -0700 Change: 2020-10-09 13:00:01.918312997 -0700 Birth: -
OK, so we have the inode, but where on the disk is that inode? This is where the next tool comes in handy. It's called debugfs and it's used to help debug (or simply glean information about) the ext2/ext3/ext4 filesystems. This is the command that will map the inode number into a disk block:
This command essentially runs debugfs and tells it to map inode #2 to its corresponding disk block on /dev/sda1, which is the first partition on the first hard drive in the system. If you want to see all of the partitions on all of the drives, just run the lsblk command and you'll see something like this:sudo debugfs -R 'imap <2>' /dev/sda1
I've highlighted the partition that we're interested in which is the first partition on the first disk.NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 931.5G 0 disk ├─sda1 8:1 0 39.1G 0 part / ├─sda2 8:2 0 15.6G 0 part ├─sda3 8:3 0 39.1G 0 part ├─sda4 8:4 0 1K 0 part ├─sda5 8:5 0 781.3G 0 part /home ├─sda6 8:6 0 41G 0 part /opt └─sda7 8:7 0 15.5G 0 part [SWAP] sdb 8:16 0 3.7T 0 disk └─sdb1 8:17 0 3.7T 0 part /storage sdc 8:32 0 3.7T 0 disk └─sdc1 8:33 0 3.7T 0 part /media/chico/wd-elements1 sdd 8:48 0 3.7T 0 disk └─sdd1 8:49 0 3.7T 0 part /media/chico/wd-elements3 sde 8:64 0 3.7T 0 disk └─sde1 8:65 0 3.7T 0 part /media/chico/wd-elements2 sr0 11:0 1 1024M 0 rom
This output is also telling me that there are 5 "disks" connected to my computer named sda, sdb, sdc, sdd, and sr0 (which is a DVD drive). It also tells me that the first drive has 7 partitions and the others only have one. Incidentally, these are the types of storage devices in the system:
OK, so back to the command:
and its output:sudo debugfs -R 'imap <2>' /dev/sda1
The important information is the last line which tells us that inode #2 is located 256 bytes (0x0100) within disk block #1057. Now, all we have to do is to read the data at that location and we will have read all of the important information about the root directory.debugfs 1.42.9 (4-Feb-2014) Inode 2 is part of block group 0 located at block 1057, offset 0x0100
To help out with my demonstration, I've written my own program that will read any blocks or partial blocks of data from any partition on any device. It's called readblock and you use it like this:
So, to read the raw bytes from inode #2, we do this:sudo readblock <partition> <block-number> <offset> <bytes-to-read>
Broken down:sudo readblock /dev/sda1 1057 0x0100 256
Note: The readblock program is a work-in-progress. It currently reads the device to find out the size of the disk blocks. Generally, the size of the blocks is 4K (4,096) bytes, which is true for all of my partitions. It is important to have the correct block size because that value is used in all of the calculations. The program also allows the user to specify values in hexadecimal (0x prefix) or decimal.
Note: There are existing tools on Linux that will do something similar to my readblock program. However, I wanted to have total control over the output, so I wrote my own. It's only a few lines of code, actually. One such tool on Linux is dd. Very handy, powerful, and, dangerous! Read up on it before using it! YOU HAVE BEEN WARNED!
So, the actual bytes that will be read are bytes 4,329,998 to 4,330,254. The way we arrived at those numbers was:
Now, because the information in the inode is mostly binary, when displaying it on the screen it will just look like garbage:BlockNumber * BlockSize + Offset 1057 * 4096 + 256 = 4,329,742 + 256 = 4,329,998 [starting byte] + 256 = 4,330,254 [ending byte]
However, it really did read and display (or try to display) 256 bytes of binary data. One thing you can do is to redirect the output to a file:�AqY�\Qπ_Qπ7�!$ �P�ɬP��0尦��X
On the disk you'll see that it's exactly 256 bytes;sudo readblock /dev/sda1 1057 0x0100 256 > inode2.bin
Output:ls -l inode2.bin
Now, you can just use any of the bajillion hex viewers to look at it such as hexdump or od (octal dump)-rw------- 1 chico chico 256 Oct 9 14:46 inode2.bin
Output:od -x inode2.bin
Or, better yet, how about the trusty old dumpit program:0000000 41ed 0000 1000 0000 5971 5ce0 cf51 5f80 0000020 cf51 5f80 0000 0000 0000 001d 0008 0000 0000040 0000 0008 3714 0000 f30a 0001 0004 0000 0000060 0000 0000 0000 0000 0001 0000 2421 0000 0000100 0000 0000 0000 0000 0000 0000 0000 0000 * 0000200 0020 0000 50ac c9c5 50ac c9c5 1830 b0e5 0000220 ffa6 58ac 0000 0000 0000 0000 0000 0000 0000240 0000 0000 0000 0000 0000 0000 0000 0000 * 0000400
Output:dumpit inode2.bin
This is showing us the actual raw binary data that is stored in the disk block.inode2.bin: 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 00 00 00 10 00 00 71 59 E0 5C 51 CF 80 5F .A......qY.\Q.._ 000010 51 CF 80 5F 00 00 00 00 00 00 1D 00 08 00 00 00 Q.._............ 000020 00 00 08 00 14 37 00 00 0A F3 01 00 04 00 00 00 .....7.......... 000030 00 00 00 00 00 00 00 00 01 00 00 00 21 24 00 00 ............!$.. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 20 00 00 00 AC 50 C5 C9 AC 50 C5 C9 30 18 E5 B0 ....P...P..0... 000090 A6 FF AC 58 00 00 00 00 00 00 00 00 00 00 00 00 ...X............ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
In fact, let's skip the temporary file creation and just pipe the output of readblocks directly into dumpit:
That will produce the same output! Yeah, pipes are a wonderful thing! (If you don't have access to the dumpit program, you can just use od or something similar.)sudo readblock /dev/sda1 1057 0x0100 256 | dumpit
Most of the entries are zeros, but there is a bunch of other stuff. Specifically, those values represent permissions (read/write/execute and owner/group) as well as time/date of the file, how big it is, what type of file/directory it is, etc. However, what we are interested in is the contents of the root directory. Remember, our goal in all of this is to locate and display this file:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 00 00 00 10 00 00 71 59 E0 5C 51 CF 80 5F .A......qY.\Q.._ 000010 51 CF 80 5F 00 00 00 00 00 00 1D 00 08 00 00 00 Q.._............ 000020 00 00 08 00 14 37 00 00 0A F3 01 00 04 00 00 00 .....7.......... 000030 00 00 00 00 00 00 00 00 01 00 00 00 21 24 00 00 ............!$.. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 20 00 00 00 AC 50 C5 C9 AC 50 C5 C9 30 18 E5 B0 ....P...P..0... 000090 A6 FF AC 58 00 00 00 00 00 00 00 00 00 00 00 00 ...X............ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Currently, we've just found the root directory's inode. Now, with this, we need to get the contents of the root directory because that's where the homes directory is located. I've highlighted some bytes in the output above. The hex number: 00 00 24 21 is the one. (My system is little-endian, so that's why the bytes appear to be reversed.) That number is a pointer (block number) to another block that contains the contents (i.e. the filenames) in the root directory./homes/chico/kitchen/refrigerator/cake
Ok, so how do we read the contents? Simple. We use the readblock program again:Aside: There is a lot of information encoded in that inode and most of it is not necessary to understand in order to learn how the filesystem works. I will point out some other useful bits of information later. For now, the only piece we are interested in is the location (read: pointer) of the contents of the directory. That's what is highlighted. There are links below that will describe the layout of the inode and all of its data fields in excrutiating detail.
I'm just reading the first 512 bytes from data block #0x2421 (9249 in decimal), as that will contain what we're looking for. Of course, all data blocks are 4,096 bytes in length and if I showed every byte, all of the bytes at the end would be 0.sudo readblock /dev/sda1 0x2421 0 512 | dumpit
I've highlighted the name of the directory we're searching for (homes) as well as a few other things. The 05 is the length of the filename, as these are not NUL-terminated strings (like C/C++). Also, the 02 is the type of file (0x02 means it's a directory). Files can be of these types:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 02 00 00 00 0C 00 01 02 2E 00 00 00 02 00 00 00 ................ 000010 0C 00 02 02 2E 2E 00 00 0B 00 00 00 14 00 0A 02 ................ 000020 6C 6F 73 74 2B 66 6F 75 6E 64 00 00 0C 00 00 00 lost+found...... 000030 14 00 0A 07 69 6E 69 74 72 64 2E 69 6D 67 00 00 ....initrd.img.. 000040 0D 00 00 00 10 00 07 07 76 6D 6C 69 6E 75 7A 00 ........vmlinuz. 000050 01 00 24 00 0C 00 03 02 62 69 6E 00 01 00 08 00 ..$.....bin..... 000060 0C 00 04 02 62 6F 6F 74 01 00 0C 00 10 00 05 02 ....boot........ 000070 63 64 72 6F 6D 00 00 00 01 00 0A 00 0C 00 03 02 cdrom........... 000080 64 65 76 00 01 00 02 00 0C 00 03 02 65 74 63 00 dev.........etc. 000090 01 00 0E 00 0C 00 04 02 68 6F 6D 65 01 00 14 00 ........home.... 0000A0 0C 00 03 02 6C 69 62 00 01 00 20 00 10 00 05 02 ....lib... ..... 0000B0 6C 69 62 33 32 00 00 00 01 00 10 00 10 00 05 02 lib32........... 0000C0 6C 69 62 36 34 00 00 00 01 00 04 00 10 00 06 02 lib64........... 0000D0 6C 69 62 78 33 32 00 00 01 00 16 00 10 00 05 02 libx32.......... 0000E0 6D 65 64 69 61 00 00 00 01 00 06 00 0C 00 03 02 media........... 0000F0 6D 6E 74 00 01 00 22 00 0C 00 03 02 6F 70 74 00 mnt...".....opt. 000100 01 00 18 00 0C 00 04 02 70 72 6F 63 01 00 1C 00 ........proc.... 000110 0C 00 04 02 72 6F 6F 74 01 00 1A 00 0C 00 03 02 ....root........ 000120 72 75 6E 00 01 00 1E 00 0C 00 04 02 73 62 69 6E run.........sbin 000130 02 00 08 00 0C 00 03 02 73 72 76 00 02 00 04 00 ........srv..... 000140 0C 00 03 02 73 79 73 00 02 00 06 00 0C 00 03 02 ....sys......... 000150 74 6D 70 00 02 00 0A 00 0C 00 03 02 75 73 72 00 tmp.........usr. 000160 02 00 02 00 0C 00 03 02 76 61 72 00 02 00 0C 00 ........var..... 000170 0C 00 04 02 77 65 62 6D 66 77 06 00 14 00 07 02 ....webmfw...... 000180 73 74 6F 72 61 67 65 6F 46 71 57 41 0E 00 00 00 storageoFqWA.... 000190 10 00 05 01 2E 68 63 77 64 00 00 00 1A 0E 02 00 .....hcwd....... 0001A0 10 00 07 02 2E 63 6F 6E 66 69 67 74 D2 05 14 00 .....configt.... 0001B0 10 00 05 02 68 6F 6D 65 73 31 77 76 0F 00 00 00 ....homes1wv.... 0001C0 44 0E 10 01 77 65 62 6D 69 6E 2D 73 65 74 75 70 D...webmin-setup 0001D0 2E 6F 75 74 12 00 00 00 2C 0E 12 01 2E 69 73 6D .out....,....ism 0001E0 6F 75 6E 74 2D 74 65 73 74 2D 66 69 6C 65 00 00 ount-test-file.. 0001F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Code Type of file 0 Unknown 1 regular file 2 directory 3 character device 4 block device 5 FIFO 6 socket 7 symbolic link
Lastly, and most importantly, I've highlighted the number D2 05 14 00, as this is the inode (little endian) for the homes directory. Remember, in addition to finding and searching the root directory, we also have to find and search the homes, chico, kitchen, and refrigerator directories. This is what's happening "behind the scenes" every time you try to access any file on the system.
Ok, so now it's time to search through the contents of the homes directory and see if we can locate the directory named chico.Incidentally, the 2E and 2E 2E values at the top of the output correspond to the current directory (just a single dot .) and the parent directory, (2 dots ..) which are two directories you will find in every directory (even the root, which has no parent!)
First, we have to read the inode for the homes directory. We know that the inode number is 0x001405D2 because that's what we found in the root directory. Converting the hex to decimal we get 1312210. To verify that we are actually correct, we can simply stat the homes directory:
Output:stat /homes
Of course, we could have done this as well:File: '/homes' Size: 4096 Blocks: 8 IO Block: 4096 directory Device: 801h/2049d Inode: 1312210 Links: 8 Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2020-10-08 13:15:10.002908649 -0700 Modify: 2020-10-08 13:20:28.402906431 -0700 Change: 2020-10-08 13:20:28.402906431 -0700 Birth: -
Output:ls -ldi /homes
Ok, let's dump that inode using readblock. First, we have to find out where (read: in which disk block) the inode resides. Using debugfs again to map the inode number to a disk block:1312210 drwxr-xr-x 8 root root 4,096 Oct 8 13:20 /homes
Output:sudo debugfs -R 'imap <1312210>' /dev/sda1
Using this information, we can read the block:Inode 1312210 is part of block group 160 located at block 5243005, offset 0x0100
Output:sudo readblock /dev/sda1 5243005 0x0100 256 | dumpit
This is the inode for the homes directory. We need to see the content (read: filenames) in the directory. I've highlighted the pointer to the contents above. Now, read that block to get the contents:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 00 00 00 10 00 00 4E 73 7F 5F 8C 74 7F 5F .A......Ns._.t._ 000010 8C 74 7F 5F 00 00 00 00 00 00 08 00 08 00 00 00 .t._............ 000020 00 00 08 00 07 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 AF 2B 50 00 .............+P. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 2D AD E8 08 00 00 00 00 00 00 00 00 ....-........... 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 FC 74 0F 60 FC 74 0F 60 A4 87 B1 00 .....t.`.t.`.... 000090 4E 73 7F 5F A4 87 B1 00 00 00 00 00 00 00 00 00 Ns._............ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Output:sudo readblock /dev/sda1 0x00502BAF 0 256 | dumpit
Aw, yeah! Now we're cookin' with gas! I've highlighted the name (chico) and its corresponding inode (0x00146029). Remember, this is what's in /homes:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 D2 05 14 00 0C 00 01 02 2E 00 00 00 02 00 00 00 ................ 000010 0C 00 02 02 2E 2E 00 00 29 60 14 00 10 00 05 02 ........)`...... 000020 63 68 69 63 6F 00 00 00 38 60 14 00 10 00 05 02 chico...8`...... 000030 61 6C 76 69 6E 00 00 00 39 60 14 00 10 00 08 02 alvin...9`...... 000040 76 65 72 6F 6E 69 63 61 3A 60 14 00 10 00 05 02 veronica:`...... 000050 62 65 74 74 79 00 00 00 3B 60 14 00 0C 00 04 02 betty...;`...... 000060 66 72 65 64 3C 60 14 00 9C 0F 05 02 77 69 6C 6D fred<`......wilm 000070 61 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 a............... 000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
chico@nina ~ $ ls -l /homes total 24,576 drwxr-xr-x 2 alvin alvin 4,096 Oct 8 13:20 alvin drwxr-xr-x 2 betty betty 4,096 Oct 8 13:20 betty drwxr-xr-x 8 chico chico 4,096 Oct 8 13:20 chico drwxr-xr-x 2 fred fred 4,096 Oct 8 13:20 fred drwxr-xr-x 2 veronica veronica 4,096 Oct 8 13:20 veronica drwxr-xr-x 2 wilma wilma 4,096 Oct 8 13:20 wilma chico@nina ~ $
We can find the block that contains this inode for /homes/chico:Output:sudo debugfs -R 'imap <0x00146029>' /dev/sda1
Then dump the inode:Inode 1335337 is part of block group 163 located at block 5244450, offset 0x0800
Output:sudo readblock /dev/sda1 5244450 0x0800 256 | dumpit
To get the contents of the /homes/chico directory, we have to follow the pointer that is highlighted above (0x00502BB8) and dump the first few bytes of the block:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 EA 03 00 10 00 00 73 73 7F 5F B4 74 7F 5F .A......ss._.t._ 000010 A4 74 7F 5F 00 00 00 00 EB 03 08 00 08 00 00 00 .t._............ 000020 00 00 08 00 13 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 B8 2B 50 00 .............+P. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 32 AD E8 08 00 00 00 00 00 00 00 00 ....2........... 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 90 8C 6F E8 68 5E 5E 07 90 A3 64 82 ......o.h^^...d. 000090 73 73 7F 5F 90 A3 64 82 00 00 00 00 00 00 00 00 ss._..d......... 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Output:sudo readblock /dev/sda1 0x502BB8 0 256 | dumpit
As a reminder, this is what's in /homes/chico. There are 4 visible directories there. You'll also see there are a few hidden files/directories also, but we can ignore those.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 29 60 14 00 0C 00 01 02 2E 00 00 00 D2 05 14 00 )`.............. 000010 0C 00 02 02 2E 2E 00 00 2A 60 14 00 10 00 07 02 ........*`...... 000020 2E 63 6F 6E 66 69 67 00 2B 60 14 00 10 00 08 02 .config.+`...... 000030 2E 6D 6F 7A 69 6C 6C 61 B7 15 14 00 1C 00 11 01 .mozilla........ 000040 2E 63 6F 6D 70 74 6F 6E 2D 74 64 65 2E 63 6F 6E .compton-tde.con 000050 66 50 30 00 B6 15 14 00 14 00 0C 01 2E 62 61 73 fP0..........bas 000060 68 5F 6C 6F 67 6F 75 74 B9 15 14 00 18 00 0B 01 h_logout........ 000070 2E 78 63 6F 6D 70 6D 67 72 72 63 4C 53 63 74 6E .xcompmgrrcLSctn 000080 B8 15 14 00 10 00 08 01 2E 70 72 6F 66 69 6C 65 .........profile 000090 3D 60 14 00 10 00 07 02 6B 69 74 63 68 65 6E 67 =`......kitcheng 0000A0 3E 60 14 00 10 00 07 02 62 65 64 72 6F 6F 6D 00 >`......bedroom. 0000B0 3F 60 14 00 10 00 08 02 62 61 74 68 72 6F 6F 6D ?`......bathroom 0000C0 40 60 14 00 40 0F 06 02 67 61 72 61 67 65 00 00 @`..@...garage.. 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
This time, I've highlighted the kitchen directory and its inode (0x0014603D) because now we need to find the refrigerator directory in the kitchen directory.
Now, find the block that contains the inode:
Output:sudo debugfs -R 'imap <0x0014603D>' /dev/sda1
Then, dump the inode:Inode 1335357 is part of block group 163 located at block 5244451, offset 0x0c00
Output:sudo readblock /dev/sda1 5244451 0x0c00 256 | dumpit
To get the contents of the /homes/chico/kitchen directory, we need to follow the highlighted pointer (block) above and dump the first few bytes of that block:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 00 00 00 10 00 00 A4 74 7F 5F CE 74 7F 5F .A.......t._.t._ 000010 CE 74 7F 5F 00 00 00 00 00 00 08 00 08 00 00 00 .t._............ 000020 00 00 08 00 07 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 FE 23 50 00 .............#P. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 D2 AD E8 08 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 CC FD DF 63 CC FD DF 63 68 5E 5E 07 .......c...ch^^. 000090 A4 74 7F 5F 68 5E 5E 07 00 00 00 00 00 00 00 00 .t._h^^......... 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Output:sudo readblock /dev/sda1 0x005023FE 0 256 | dumpit
Again, this is what's in /homes/chico/kitchen. You'll see 6 visible directories. Now that we've located the refrigerator directory, it's time to find out what's in it.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 3D 60 14 00 0C 00 01 02 2E 00 00 00 29 60 14 00 =`..........)`.. 000010 0C 00 02 02 2E 2E 00 00 41 60 14 00 14 00 0C 02 ........A`...... 000020 72 65 66 72 69 67 65 72 61 74 6F 72 42 60 14 00 refrigeratorB`.. 000030 0C 00 04 02 73 69 6E 6B 43 60 14 00 14 00 09 02 ....sinkC`...... 000040 63 75 70 62 6F 61 72 64 73 00 00 00 44 60 14 00 cupboards...D`.. 000050 0C 00 04 02 6F 76 65 6E 45 60 14 00 10 00 05 02 ....ovenE`...... 000060 73 74 6F 76 65 00 00 00 46 60 14 00 98 0F 09 02 stove...F`...... 000070 6D 69 63 72 6F 77 61 76 65 00 00 00 00 00 00 00 microwave....... 000080 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000090 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
As usual, find the block that contains the inode for refrigerator:
Output:sudo debugfs -R 'imap <0x00146041>' /dev/sda1
Then dump the inode:Inode 1335361 is part of block group 163 located at block 5244452, offset 0x0000
Then follow the pointer to get the contents of refrigerator:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 41 EA 03 00 10 00 00 CE 74 7F 5F 5E 75 7F 5F .A.......t._^u._ 000010 10 75 7F 5F 00 00 00 00 EB 03 02 00 08 00 00 00 .u._............ 000020 00 00 08 00 0B 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 22 24 50 00 ............"$P. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 DD AD E8 08 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 1C 92 58 83 A8 26 C4 13 CC FD DF 63 ......X..&.....c 000090 CE 74 7F 5F CC FD DF 63 00 00 00 00 00 00 00 00 .t._...c........ 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Output:sudo readblock /dev/sda1 0x00502422 0 256 | dumpit
Reminder, this is what's in /homes/chico/kitchen/refrigerator. You'll see 10 visible files (not directories) this time. Now that we've located the cake file, it's time to find out what's in it. The process is the same as it is for directories.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 41 60 14 00 0C 00 01 02 2E 00 00 00 3D 60 14 00 A`..........=`.. 000010 0C 00 02 02 2E 2E 00 00 FB 15 14 00 0C 00 04 01 ................ 000020 6D 69 6C 6B FC 15 14 00 0C 00 04 01 65 67 67 73 milk........eggs 000030 FD 15 14 00 10 00 06 01 62 75 74 74 65 72 00 00 ........butter.. 000040 FE 15 14 00 10 00 05 01 6A 75 69 63 65 00 00 00 ........juice... 000050 FF 15 14 00 10 00 06 01 63 68 65 65 73 65 00 00 ........cheese.. 000060 00 16 14 00 0C 00 04 01 63 6F 6B 65 01 16 14 00 ........coke.... 000070 10 00 06 01 61 70 70 6C 65 73 00 00 02 16 14 00 ....apples...... 000080 10 00 07 01 63 68 69 63 6B 65 6E 00 03 16 14 00 ....chicken..... 000090 0C 00 04 01 63 61 6B 65 04 16 14 00 68 0F 03 01 ....cake....h... 0000A0 70 69 65 00 00 00 00 00 00 00 00 00 00 00 00 00 pie............. 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
We have the inode for cake and it's inode 0x00141602. Let's dump out the inode. First, get the block that contains it:
Output:sudo debugfs -R 'imap <0x00141602>' /dev/sda1
Then, dump the inode:Inode 1316355 is part of block group 160 located at block 5243264, offset 0x0200
Additionally, I've highlighted 4 bytes above because they have a specific meaning. Those bytes are the actual size of the file (0x0000004C is 76 in decimal). We'll return to this value shortly.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 A4 81 EA 03 4C 00 00 00 10 75 7F 5F 79 C7 80 5F ....L....u._y.._ 000010 79 C7 80 5F 00 00 00 00 EB 03 01 00 08 00 00 00 y.._............ 000020 00 00 08 00 01 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 08 20 3E 00 ............. >. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 ED AD E8 08 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 54 B3 DE 60 54 B3 DE 60 A8 26 C4 13 ....T..`T..`.&.. 000090 10 75 7F 5F A8 26 C4 13 00 00 00 00 00 00 00 00 .u._.&.......... 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
If we follow the pointer above, it will take us to the contents of the cake file, which is the actual text that is in the file.
Output:sudo readblock /dev/sda1 0x003E2008 0 128 | dumpit
This time, I chose to just dump out 128 bytes, because the file is smaller than that. Because all of the data in the cake file is text, I don't need to use dumpit as I can just display it:00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 65 67 67 73 0A 62 75 74 74 65 72 0A 6D 69 6C 6B eggs.butter.milk 000010 0A 66 6C 6F 75 72 0A 76 61 6E 69 6C 6C 61 0A 69 .flour.vanilla.i 000020 63 69 6E 67 0A 73 74 72 61 77 62 65 72 72 69 65 cing.strawberrie 000030 73 0A 70 65 61 63 68 65 73 0A 6C 65 74 74 75 63 s.peaches.lettuc 000040 65 0A 61 73 70 61 72 61 67 75 73 0A 00 00 00 00 e.asparagus..... 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
Output:sudo readblock /dev/sda1 0x003E2008 0 128
The reason it stops after asparagus is because that's the end of the printable characters and character 0 doesn't print anything. This is the exact output we got from our original command:eggs butter milk flour vanilla icing strawberries peaches lettuce asparagus
This was the entire purpose of this demonstration: To show exactly what is going on behind the scenes. So, to get back to the original question: "How many disk reads were required to locate (search), open (read), and display the file?"cat /homes/chico/kitchen/refrigerator/cake
Now, you should be able to answer that.
If we use ls to look at the file:
Output:ls -l /homes/chico/kitchen/refrigerator/cake
We can, in fact, see that the file size is 76 bytes, which is part of the inode (metadata) that is associated with this file that was shown above.-rw-r--r-- 1 chico chico 76 Oct 9 13:26 /homes/chico/kitchen/refrigerator/cake
Remember these two files from before? I said that this file:
is going to require significantly less work than locating this file:/usr/hostname
It should be clear and obvious why that is. Each directory adds 2 additional disk reads to the process. One read is for the inode and one is for the contents. So, files stored very deep in the heirarchy are much more expensive to read than ones that are shallow. To reach the hostname file, the system only has to read 2 directories (root and usr) but to reach file.txt it has to read 14 directories! (The root directory plus 13 subdirectories)/usr/share/icons/foo/bar/baz/bat/one/more/dir/and/were/done/file.txt
Notes:
/usr/share/icons/foo/bar/baz/bat/one/more/dir/and/were/done/file1.txt /usr/share/icons/foo/bar/baz/bat/one/more/dir/and/were/done/file2.txtThe first file will incur a steep cost to locate and read all of the directories leading up to the file. However, the second file will likely only require 2 disk reads because all of the directories (inodes and data blocks) are likely to still be cached in memory, making the lookups very fast.
More Inode Details
We saw that there was quite a bit of information in the inodes that we were ignoring. We were basically just interested in using the inode to find the data block(s) associated with the file/directory. Let's look a little closer at the inode for the cake file:Output:ls -li /homes/chico/kitchen/refrigerator/cake
Output annotated:1316355 -rw-r--r-- 1 chico chico 76 Oct 12 14:24 /homes/chico/kitchen/refrigerator/cake
The ls command shows us a lot of information that is all in the inode for the file. Let's break it down. These are the fields (from left to right) and their meanings:1316355 -rw-r--r-- 1 chico chico 76 Oct 12 14:24 /homes/chico/kitchen/refrigerator/cake ^^^^^^^ ^^^^^^^^^^ ^ ^^^^^ ^^^^^ ^^ ^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ inode perms | | | | date/time fullpath of the file type | | | | / | | \ / | | \ links owner group size
Using the same technique to find the inode and dump out its contents:
Field Description 1316355 This is the inode for the file. -rw-r--r-- These are the permissions for the user, group, and others, as well as the type of file. 1 The number of hard links to this file. chico The owner (user) of the file. chico The group that the file belongs to. 76 The size of the file (in bytes). Oct 12 14:24 The date/time that the contents of the file were last modified.
Output:sudo debugfs -R 'imap <1316355>' /dev/sda1
Dump the inode:Inode 1316355 is part of block group 160 located at block 5243264, offset 0x0200
Output:sudo readblock /dev/sda1 5243264 0x200 256 | dumpit
00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 A4 81 EA 03 4C 00 00 00 10 75 7F 5F 76 C9 84 5F ....L....u._v.._ 000010 76 C9 84 5F 00 00 00 00 EB 03 01 00 08 00 00 00 v.._............ 000020 00 00 08 00 01 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 F8 60 24 00 .............`$. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 ED AD E8 08 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 E4 98 9D 1D E4 98 9D 1D A8 26 C4 13 .............&.. 000090 10 75 7F 5F A8 26 C4 13 00 00 00 00 00 00 00 00 .u._.&.......... 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
and you'll see this (on my system):cat /etc/passwd | grep chico
You can plainly see that the second field (1002) is the user ID of chico. The 1003 is the group that chico is in.chico:x:1002:1003:Chico Escuela:/home/chico:/bin/bash
and you'll see this (on my system):cat /etc/group | grep chico
chico:x:1003:
The rest of the information encodes things like the creation date/time, last access date/time, checksums, version, high 32 bits of the size, and several other obsolete, reserved, and advanced pieces of information.1316355 -rw-r--r-- 1 chico chico 76 Oct 12 14:24 /homes/chico/kitchen/refrigerator/cake
So, in a nutshell, the inode stores all of the information about a file with the exception of the filename. We saw that the filenames are stored in a directory's contents. This information in the inode is called metadata.
Extents
What about the actual contents of a file or directory? We saw that every inode has a pointer (block number) to the actual data blocks that store the contents. However, we know that, traditionally, an inode has several block pointers (15 to be exact). Recall the inode diagram and the (partial) inode for the cake file:Where are all (15) of the block pointers? Up until now I've just "magically" been saying that the data can be found by following the 4 bytes at offset 0x003C (in bold) and that this is the pointer to the data block (singular). Also, since all of our data (contents of blocks) thus far have been less than 4,096 bytes, we've never needed more than one pointer/block. Yes, there appears to be a lot of "empty" pointers that follow it, but there aren't 15 of them. Remember this self-check from above?00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 A4 81 EA 03 4C 00 00 00 10 75 7F 5F 76 C9 84 5F ....L....u._v.._ 000010 76 C9 84 5F 00 00 00 00 EB 03 01 00 08 00 00 00 v.._............ 000020 00 00 08 00 01 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 F8 60 24 00 .............`$. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 ED AD E8 08 00 00 00 00 00 00 00 00 ................ 000070 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000080 1C 00 00 00 E4 98 9D 1D E4 98 9D 1D A8 26 C4 13 .............&.. 000090 10 75 7F 5F A8 26 C4 13 00 00 00 00 00 00 00 00 .u._.&.......... 0000A0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000B0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000C0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000D0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000E0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 0000F0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
First, we need to see why the "old" inode scheme of many levels of indirection is good for fragmented files, but bad for non-fragmented (contiguous) files. Once we understand this, a better "solution" is obvious, and the solution is extents.Self-check - With multiple levels of indirection, filesystems can be implemented efficiently for fragmented files. However, for non-fragmented (i.e. contiguous files), this approach is not very efficient. Explain why that is and how a better method can be used.
Many older and less sophisticated filesystems suffered from fragmented files so the originally inode scheme made sense. However, many modern filesystems have very few fragmented files so this scheme is sub-optimal.
As an example, let's assume we have a file called file.txt that is 18,000 bytes in size. With block sizes of 4,096 bytes, the contents of the file will require 5 blocks, with the first 4 blocks being full and the last block containing 1,616 bytes.
With linked allocation, we would have something like this:1 2 3 4 5 4,096 + 4,096 + 4,096 + 4,096 + 1,616 = 18,000
With indexed allocation, we would have something like this:
The blue number above the block is the (arbitrary) byte address of the block. The numbers inside the blocks are the size of the data in the block. Because the data blocks are not contiguous, the file is fragmented.
For each of these two different schemes, answer these questions:
Linked allocation:
Indexed allocation:
Answer the same questions:
Remember, non-fragmented blocks act more like arrays than linked lists because all of the data is contiguous. We can take advantage of this fact by using extents.To see just how poorly a fragmented disk can perform, here is a forum post that I made (from July 2005). I've always been a big fan of Microsoft's Flight Simulator and have about 1,000,000 files (photo-realistic textures for parts of the United States.) Because the frame rate depends so much on reading many files per second from the slow disk, any fragmentation is going to make things even worse. You can see the significant improvements by 1) defragging the MFT (Master File Table) and 2) moving important files (e.g. textures) to the outside tracks of the spinning disks. This demonstrates that the outer tracks are moving much faster than the inner tracks (angular velocity), thereby increasing the performance.
Using extents, physical view:
Using extents, logical view:
So, looking back at the (partial) inode for the cake file we can see the extents that are in use:
The 2 bytes in blue is the number of blocks that are present in this extent. For files less than or equal to 4,096 bytes, it will always be 1. Larger files will have more blocks in the extent.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 A4 81 EA 03 4C 00 00 00 10 75 7F 5F 76 C9 84 5F ....L....u._v.._ 000010 76 C9 84 5F 00 00 00 00 EB 03 01 00 08 00 00 00 v.._............ 000020 00 00 08 00 01 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 01 00 00 00 F8 60 24 00 .............`$. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 ED AD E8 08 00 00 00 00 00 00 00 00 ................
The 2 bytes in red are the high (upper) 16-bits of the address of the data block and will only be used for very large filesystems.
As an example, let's look at /usr/bin/zip which is clearly larger than a single block:
Output:ls -li /usr/bin/zip
Find out which block contains inode 661483:661483 -rwxr-xr-x 1 root root 188,296 Oct 21 2013 /usr/bin/zip
Output:sudo debugfs -R 'imap <661483>' /dev/sda1
And then dump the inode:Inode 661483 is part of block group 80 located at block 2621854, offset 0x0a00
Partial output:sudo readblock /dev/sda1 2621854 0xa00 256 | dumpit
The inode is telling us that the extent starts with block 0x002AF2ED and extends for 46 (0x002E) blocks. If you do the00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 81 00 00 88 DF 02 00 2A 01 AD 58 2A 01 AD 58 ........*..X*..X 000010 DF 38 65 52 00 00 00 00 00 00 01 00 70 01 00 00 .8eR........p... 000020 00 00 08 00 01 00 00 00 0A F3 01 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 2E 00 00 00 ED F2 2A 00 ..............*. 000040 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 CD C7 2E D2 00 00 00 00 00 00 00 00 ................
we can see that there are exactly 318 (188,614 - 188,296) bytes that are unused in the last block. We can also see how many blocks the file used by running the stat command:4,096 * 46 = 188,614
stat /usr/bin/zip
It tells us the file consumes 368 blocks. But, wait, the inode said there were only 46 blocks. What gives? The stat command is telling us how many 512-byte blocks are used by the file. Since the filesystem uses 4,096-byte blocks, just divide the value from stat by 8 and you'll get 46.File: '/usr/bin/zip' Size: 188296 Blocks: 368 IO Block: 4096 regular file Device: 801h/2049d Inode: 661483 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2017-02-21 19:10:34.696610844 -0800 Modify: 2013-10-21 07:23:27.000000000 -0700 Change: 2017-02-21 19:10:34.696610844 -0800 Birth: -
Ok, but what if, for some reason, all of the blocks are not contiguous. Maybe you have a really large file that does have "gaps" in the extents. Lets look at this file on my system
Output:ls -li /usr/bin/rosegarden
This rosegarden file is over 15 megabytes in size and may not be 100% contiguous. Let's run stat on it first to see the output:669122 -rwxr-xr-x 1 root root 15,863,224 Oct 22 2013 /usr/bin/rosegarden
We can see that there are 30,984 512-byte blocks or 3,873 I/O blocks (4,096 bytes). Using the debugfs command, we can see some information about the extents:File: '/usr/bin/rosegarden' Size: 15863224 Blocks: 30984 IO Block: 4096 regular file Device: 801h/2049d Inode: 669122 Links: 1 Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2018-09-23 12:20:30.000000000 -0700 Modify: 2013-10-22 05:47:12.000000000 -0700 Change: 2018-09-23 12:20:32.525504593 -0700 Birth: -
Output:sudo debugfs -R 'stat /usr/bin/rosegarden' /dev/sda1
This command produces a lot more information. The lines at the bottom tell us that there are 2 extents (e.g. 2 contiguous sets of blocks). The first extent is 2,048 blocks in length (with the corresponding block addresses) and the second extent is 1,825 blocks in length. If you add those numbers together (2,048 + 1,825) you'll get 3,873, the number of 4,096-byte I/O blocks used by the file.Inode: 669122 Type: regular Mode: 0755 Flags: 0x80000 Generation: 1883018732 Version: 0x00000000:00000001 User: 0 Group: 0 Size: 15863224 File ACL: 0 Directory ACL: 0 Links: 1 Blockcount: 30984 Fragment: Address: 0 Number: 0 Size: 0 ctime: 0x5ba7e780:7d4a4144 -- Sun Sep 23 12:20:32 2018 atime: 0x5ba7e77e:00000000 -- Sun Sep 23 12:20:30 2018 mtime: 0x526673d0:00000000 -- Tue Oct 22 05:47:12 2013 crtime: 0x5ba7e780:60ae0954 -- Sun Sep 23 12:20:32 2018 Size of extra inode fields: 28 EXTENTS: (0-2047):6352896-6354943, (2048-3872):6356992-6358816
Here's the partial inode for the rosegarden file:
The 2 bytes highlighted in red on the third line tell us that there are 2 extents in this file.00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F -------------------------------------------------------------------------- 000000 ED 81 00 00 B8 0D F2 00 7E E7 A7 5B 80 E7 A7 5B ........~..[...[ 000010 D0 73 66 52 00 00 00 00 00 00 01 00 08 79 00 00 .sfR.........y.. 000020 00 00 08 00 01 00 00 00 0A F3 02 00 04 00 00 00 ................ 000030 00 00 00 00 00 00 00 00 00 08 00 00 00 F0 60 00 ..............`. 000040 00 08 00 00 21 07 00 00 00 00 61 00 00 00 00 00 ....!.....a..... 000050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 000060 00 00 00 00 EC 95 3C 70 00 00 00 00 00 00 00 00 ......
The 2 bytes highlighted in blue on the third line tell us that this inode can hold at most 4 extents.
The file size (highlighted on the first line, 0x00F20DB8) is decimal 15,863,224, which is what the other commands told us.
The first extent starts at block 0x0060F000 (decimal 6352896) and consumes 2,048 (0x8000) blocks. The second extent starts at block 0x00600000 (decimal 6356992) and extends for 2,048 (0x8000) blocks. Wait. What? That's 4,096 blocks, but the file is only 3,873 blocks. What gives?
The short answer is that the filesystem has reserved some extra blocks at the end. This allows the file to grow without getting fragmented. If the filesystem had not done this and some other file's data ended up after the first file's data, we would end up fragmenting the first file when more data was appended to it.
The long answer is more complicated and beyond the scope of this introduction. Follow some of the links below, if you're interested.
Keep in mind that the filesystem (via the inode) knows how large the file is and how many blocks are actually valid, so it isn't going to "accidentally" read the invalid blocks/bytes at the end of the extent. In fact, the 2 bytes highlighted in red on the fifth line tells us how many of the blocks in the extent are valid (0x0721 is decimal 1825).
Some obvious questions:
Notes:
References
Links