Scratching the surface of modern Computers and CPUs
September 30, 2024 #Scratching the CPU #CPU #Assembly #Why and How of computers #Low-Level-Computing #Binary #Transistors #LogicOperations #Microcoding #HardwareManipulation #Kernel#K_logTelevised #Binary #OperatingSystem #GNU/Linux #UNIX #Android #MacOS #BSD #AssemblyLanguage #MachineCode #LogicOperations #DigitalSystems #Transistors #PerformanceOptimization #HardwareManipulation #Microcoding #HDL #VHDL #Verilog #Programming #CLanguage #Userland #Kernel #PackageManager #GUI #SoftwareDevelopment #Security
01010111 01101000 01111001 00100000 01110100 01101000 01101001 01110011 ?
Okay so we all watched the movies and saw people hacking away some black screens with zeros and ones and magic happening. We were all confirmed that yes there is some 01 something happening somewhere and we probably all opened even that paint program, some file editor let alone something like a video game and wondered HOOOOW?!
Okay so I will try to explain it in short. But in long we will make an entire Operating System ( OS ). Not a very original one to be honest mostly bootlegged pieces put together and glued with superglue but we will go over everything and explain why and how it fits together and by the end of it hopefully you will understand it all, well kinda, you will understand why it's hard for anyone to really truly understand everything.
We will be talking about GNU/Linux specifically, which is based on UNIX. Android is Linux based ( Has it's own userland Android/Linux I guess? ) and MacOS ( and iOS ) is built on top of the same system UNIX. Same goes for all BSDs( Free, Open... ) some of which the Sony Play Stations are based on and Windows is a bit different ( In a bad way ) but not that different at least in terms of the core concepts.
Basically an OS ( Operating System ) has 2 parts the kernel ( Linux )and the userland ( GNU or some other like Android has for example ). For Linux those are 2 different projects developed independently and then most of the Linux distros build like an appstore ( a Package Manager ) on top of that and preconfigure it with a Graphical User Interface ( GUI ) and software for people that don't want to deal with ( Operating ) system administration, that includes the Linux founder by the way, so if you ever feel a bit dizzy from all of this don't worry you'll get use to it.
All the other OSs have everything unified, they have teams sometimes the size of other companies as you can see. And I think that's one of the reasons people think it goes from 01 to a GUI in some clear and linear way, but let's carry on with explaining what is what here now.
So The Kernel itself abstracts the need for a user to interact with hardware directly you can think of it as a mini logic systems that translates the 01000010 01001001 01001110 01000001 01010010 01011001 stuff to some human readable format. And then programming the kernel is done in a language, that although can allow you to interact with the hardware itself, was designed to abstract interacting with it C. Even one of the lowest languages for programming ( That we will talk about now ) is designed to abstract the zero_one stuff. And then there is the userland in which even though setting things up is not done in plain English and there is a lot of clever putting together going on it's more like putting together legos than calculating an equation. So yes if you ever see someone actually playing around with zeros and ones they probably are trying to manipulate the system somehow to gain some unauthorized access. There! I saved you a looot of time and digging if you just wanted a overview.
Now if you want to get nitty gritty and technical we will start here trying to understand again how it's even possible for 01000010 01001001 01001110 01000001 01010010 01011001 to do this. It will be boring from time to time but you will have a pieced up OS in the end so don't complain too much!
Oh yeah and this will be more like a TV show with many season rather than a movie. But I will try to make it as concise ( Yeah I know I can go on sometimes.. ) and engaging as possible so thank you for tuning in and let's go.
Why Binary?
-
Simplicity and Reliability in Digital Systems: Binary simplifies the design of electronic circuits. Digital systems, including CPUs, rely on transistors that act as switches. These switches can be in one of two states: on or off, corresponding to binary 1 and 0. This binary representation is the simplest and most reliable way to process information at the hardware level.
- Example: A transistor in a CPU can switch between a high voltage (representing 1) and a low voltage (representing 0). This clear distinction ensures that the CPU can operate quickly and reliably, as each transistor only needs to recognize two states.
-
Robust Against Noise: Binary systems are robust against electrical noise because they only need to distinguish between two states. This binary distinction simplifies the design and increases the noise immunity of digital circuits.
- Example: In a noisy environment, a binary signal with a high threshold (e.g., anything above 2.5V is considered 1) and a low threshold (e.g., anything below 0.5V is 0) remains clear and easy to interpret, ensuring reliable operations even with minor voltage fluctuations.
-
Efficient Logical Operations: Binary numbers align naturally with Boolean logic, which forms the basis of all digital computations. Logical operations such as AND, OR, and NOT are straightforward and efficient to implement in binary systems.
- Example: A binary AND operation is simple and quick: 1 AND 1 = 1, while all other combinations result in 0. This simplicity allows for the creation of complex logical functions using basic binary operations.
Role of Assembly programming language
Assembly language is a low-level programming language that provides a human-readable way to interact with the CPU’s binary instructions. It serves as a critical layer of abstraction, making it easier to manage and manipulate the CPU directly.
- Human-Readable Mnemonics:
Instead of dealing with raw binary or hexadecimal codes, assembly language uses mnemonic codes that are easier to understand and remember. Each assembly instruction corresponds to a specific machine code instruction but is represented in a more accessible form.
- Example: Rather than writing 10111000 01100001 to move the value 97 into the AX register, you write MOV AX, 61h in assembly. This mnemonic makes it clear that the instruction moves a hexadecimal value into a register.
; Move the hexadecimal value 0x61 into the AX register
MOV AX, 61h
- Direct control over Hardware:
Assembly language allows programmers to manipulate hardware directly. This is particularly useful for tasks that require fine-grained control over the CPU and memory, such as writing operating systems, firmware, or performance-critical applications.
- Example: If you want to control an I/O port on a microcontroller, you might use an assembly instruction like OUT 0x60, AL to send the contents of the AL register to the port at address 0x60.
; Send the value in the AL register to the I/O port at address 0x60
OUT 0x60, AL
- Performance Optimization:
Writing in assembly enables detailed performance optimizations. Programmers can tailor their code to take advantage of specific CPU features and instruction sets, achieving greater efficiency than would be possible with higher-level languages.
- Example: In a tight loop that processes data, an assembly programmer can optimize the loop by minimizing the number of instructions and making use of CPU registers effectively, reducing the overhead and increasing the speed of execution.
; A simple loop that decrements the CX register and checks if it is zero
MOV CX, 10 ; Initialize CX with 10
loop_start:
DEC CX ; Decrement CX
JNZ loop_start ; Jump to loop_start if CX is not zero
- Simplifying Complex Instructions:
Assembly abstracts away some of the complexities of binary and machine code, providing a more structured way to perform operations. For instance, assembly language allows for the use of symbolic names for memory addresses and variables, simplifying the programming process.
- Example:
Instead of remembering the memory address
0x7FFE
an assembly programmer can use a label likeBUFFER_START
to refer to the start of a buffer, making the code easier to read and maintain.
- Example:
Instead of remembering the memory address
; Define a buffer starting at a specific memory address
BUFFER_START EQU 0x7FFE
; Move data to the start of the buffer
MOV [BUFFER_START], AX
Beyond Assembly: Lower-Level Interactions
Interacting with the CPU below the level of assembly involves dealing directly with machine code or even manipulating the hardware itself. While this offers the ultimate control, it is complex and typically reserved for specialized applications or hardware design.
- Machine code programming
Machine code is the actual binary code executed by the CPU. Each instruction in machine code is a sequence of bits that the CPU decodes and executes. Programming in machine code involves writing these binary sequences directly, which is highly intricate and error-prone.
- Example:
To add two numbers and store the result in a register using machine code, you might write
00000001 11000001
for the instructionADD AX CX
where the binary code directly corresponds to the operation. This level of programming requires an intimate understanding of the CPU's instruction set and encoding schemes.
- Example:
To add two numbers and store the result in a register using machine code, you might write
; Machine code to add the contents of CX to AX
00000001 11000001 ; Binary code for ADD AX, CX
- Microcoding:
Some CPUs use microcode to break down complex instructions into simpler operations. Microcode acts as a low-level layer that translates high-level instructions into sequences of basic operations that the hardware executes. This level of control is usually accessed by CPU designers.
- Example:
Inside a CPU, an instruction like
MUL
(multiply) might be decomposed into a series of simpler steps controlled by microcode, such as shifting and adding, which are executed sequentially to perform the multiplication.
- Example:
Inside a CPU, an instruction like
- Direct Hardware Manipulation with HDLs:
For those involved in hardware design, interacting with CPUs can involve using Hardware Description Languages (HDLs) like VHDL or Verilog. These languages allow engineers to describe and simulate the behavior of electronic circuits at a very low level.
- Example: Using VHDL, an engineer might design a simple ALU (Arithmetic Logic Unit) that performs addition, subtraction, and logical operations:
entity ALU is
Port ( A : in STD_LOGIC_VECTOR (3 downto 0);
B : in STD_LOGIC_VECTOR (3 downto 0);
RESULT : out STD_LOGIC_VECTOR (3 downto 0);
OP : in STD_LOGIC_VECTOR (1 downto 0));
end ALU;
architecture Behavioral of ALU is
begin
process(A, B, OP)
begin
case OP is
when "00" =>
RESULT <= A + B; -- Addition
when "01" =>
RESULT <= A - B; -- Subtraction
when "10" =>
RESULT <= A and B; -- AND operation
when others =>
RESULT <= (others => '0'); -- Default
end case;
end process;
end Behavioral;
Detailed examination of logic operations
Logical operations are fundamental operations that manipulate binary data at the bit level. They are essential for decision-making, data manipulation, and control flow in computing systems. The most common logical operations are AND, OR, XOR (exclusive OR), and NOT. Each of these operations performs a bitwise comparison or manipulation between binary numbers.
- AND Operation
A | B | A AND B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
The AND operation compares each bit of two binary numbers and returns 1 only if both corresponding bits are 1. This operation is often used for masking bits and setting specific conditions in digital circuits.
; Assume AX = 5 (0101) and BX = 3 (0011)
MOV AX, 5 ; Load 0101 into AX
MOV BX, 3 ; Load 0011 into BX
AND AX, BX ; Perform AND operation: 0101 AND 0011 = 0001
; Result in AX is now 1 (0001)
- OR operation
A | B | A AND B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
The OR operation compares each bit of two binary numbers and returns 1 if at least one of the corresponding bits is 1. This operation is used to set specific bits in a binary number.
; Assume AX = 5 (0101) and BX = 3 (0011)
MOV AX, 5 ; Load 0101 into AX
MOV BX, 3 ; Load 0011 into BX
OR AX, BX ; Perform OR operation: 0101 OR 0011 = 0111
; Result in AX is now 7 (0111)
Use case : Setting Bits: To set specific bits in a number.
- XOR (Exclusive OR) Operation
A | B | A AND B |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
The XOR operation compares each bit of two binary numbers and returns 1 only if the corresponding bits are different. This operation is often used for toggling bits or for error detection and correction.
; Assume AX = 5 (0101) and BX = 3 (0011)
MOV AX, 5 ; Load 0101 into AX
MOV BX, 3 ; Load 0011 into BX
XOR AX, BX ; Perform XOR operation: 0101 XOR 0011 = 0110
; Result in AX is now 6 (0110)
Use Case: XOR can be used to toggle specific bits in a number.
- NOT Operation
A | NOT A |
---|---|
0 | 1 |
1 | 0 |
The NOT operation inverts each bit of a binary number (bitwise negation), changing 1s to 0s and 0s to 1s. This is useful for creating the complement of a number.
; Assume AX = 5 (0101)
MOV AX, 5 ; Load 0101 into AX
NOT AX ; Perform NOT operation: 0101 -> 1010
; Result in AX is now -6 in 2's complement representation (11111010)
Use Case: NOT operation can be used in conjunction with addition to find the two's complement of a number.
Practical Implementation: Combining Logical Operations
- Example is Masking and Shifting: Masking and shifting are often used to isolate and manipulate specific bits within a byte or word.
MOV AX, 0xFF32 ; AX = 11111111 00110010
AND AX, 0x00FF ; Mask out the upper byte: AX = 00000000 00110010
SHR AX, 2 ; Logical shift right by 2 bits: AX = 00000000 00001100
- Example is Bitwise Operations for Control: Using logical operations to set or clear specific flags or bits in control registers.
MOV AL, 0x5A ; AL = 01011010
OR AL, 0x01 ; Set the least significant bit: AL = 01011011
AND AL, 0xFE ; Clear the least significant bit: AL = 01011010
XOR AL, 0x0F ; Toggle the lower 4 bits: AL = 01010001
Logical operations are integral to the functioning of CPUs and digital systems. They provide the foundation for decision-making, bit manipulation, and control processes in computer programs and hardware design. Understanding and mastering these operations in assembly language and beyond allows for precise and efficient manipulation of data, paving the way for advanced computing tasks.
Whether using AND
for masking, OR
for setting bits, XOR
for toggling, or NOT
for bitwise negation, these operations form the core of binary processing. By leveraging these tools, programmers and engineers can unlock the full potential of their computing systems, from software applications to hardware implementations.
So basically
So the journey on how we go from 01 to a GUI app is very long one. It starts before 01.
First we deceid to use Binary Instead of the base 10 system to represent numbers and we use that to represent states of transistors. Has electricity or not.
Second we try to create shapes (circuits) based on which electricity will flow to other components like screens or keyboards. Based on these inputs, the CPU will transition into different states by performing logic operations (Boolean Algebra) like AND and NOT.
Third we combine these basic logic gates to form more complex circuits like adders, multiplexers, and memory units. These become the building blocks of the CPU and other computer components.
Fourth we design instruction sets to manipulate these components, creating a language that the CPU can understand directly. This forms the basis of machine code, which uses binary patterns to represent specific CPU instructions. CPU then moves the transistors accordingly.
Fifth we develop assembly language which provides mnemonics and symbols that correspond to machine code instructions. Or a mini system / pre designed ways of doing binary instructions. There are many of them by the way because there are many different CPU architectures. This is the first step in making programming more accessible to humans. And basically the closest to hardware even the people actually working with hardware get. Only around 2% of the Linux Kernel is written in assembly.
Sixth we create higher languages like C and C++. We make something like a Kernel a mini-system that abstracts interacting with the hardware directly and allows for it's manipulation through things like system calls. Now once we piece this together how all this files / instructions will read from one another we have a minimal userland. And then other languages with even higher levels of abstraction even start becoming usable like Python and Javascript. C allows for system ( userland ) programming while still providing some hardware access, crucial for developing kernels and low-level system components. Most of the other languages don't have the inbuilt level of access to the hardware and do it through C and assembly, meaning you kinda use python code to read and execute a .c file or there's some framework with certain templates, which is again going to be somewhat limited and directly depend on C or assembly..
Finally after we have done all this and made some terminal user interface or graphical user interface utilities and software - like file editors, , package managers, desktop enviroments and video games. We finally have something that looks like an Modern Operating System!
Stayed tuned for future "episodes" !