[assembly] What's the difference between a word and byte?

I've done some research. A byte is 8 bits and a word is the smallest unit that can be addressed on memory. The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits?

I asked a prof this question and he said most machines these days are byte-addressable, but what would that make a word?

This question is related to assembly memory hardware terminology cpu-architecture

The answer is


Why not say 8 bits?

Because not all machines have 8-bit bytes. Since you tagged this C, look up CHAR_BIT in limits.h.


Whatever the terminology present in datasheets and compilers, a 'Byte' is eight bits. Let's not try to confuse enquirers and generalities with the more obscure exceptions, particularly as the word 'Byte' comes from the expression "By Eight". I've worked in the semiconductor/electronics industry for over thirty years and not once known 'Byte' used to express anything more than eight bits.


It seems all the answers assume high level languages and mainly C/C++.

But the question is tagged "assembly" and in all assemblers I know (for 8bit, 16bit, 32bit and 64bit CPUs), the definitions are much more clear:

byte  = 8 bits 
word  = 2 bytes
dword = 4 bytes = 2Words (dword means "double word")
qword = 8 bytes = 2Dwords = 4Words ("quadruple word")

The exact length of a word varies. What I don't understand is what's the point of having a byte? Why not say 8 bits?

Even though the length of a word varies, on all modern machines and even all older architectures that I'm familiar with, the word size is still a multiple of the byte size. So there is no particular downside to using "byte" over "8 bits" in relation to the variable word size.

Beyond that, here are some reasons to use byte (or octet1) over "8 bits":

  1. Larger units are just convenient to avoid very large or very small numbers: you might as well ask "why say 3 nanoseconds when you could say 0.000000003 seconds" or "why say 1 kilogram when you could say 1,000 grams", etc.
  2. Beyond the convenience, the unit of a byte is somehow as fundamental as 1 bit since many operations typically work not at the byte level, but at the byte level: addressing memory, allocating dynamic storage, reading from a file or socket, etc.
  3. Even if you were to adopt "8 bit" as a type of unit, so you could say "two 8-bits" instead of "two bytes", it would be often be very confusing to have your new unit start with a number. For example, if someone said "one-hundred 8-bits" it could easily be interpreted as 108 bits, rather than 100 bits.

1 Although I'll consider a byte to be 8 bits for this answer, this isn't universally true: on older machines a byte may have a different size (such as 6 bits. Octet always means 8 bits, regardless of the machine (so this term is often used in defining network protocols). In modern usage, byte is overwhelmingly used as synonymous with 8 bits.


Reference:https://www.os-book.com/OS9/slide-dir/PPT-dir/ch1.ppt

The basic unit of computer storage is the bit. A bit can contain one of two values, 0 and 1. All other storage in a computer is based on collections of bits. Given enough bits, it is amazing how many things a computer can represent: numbers, letters, images, movies, sounds, documents, and programs, to name a few. A byte is 8 bits, and on most computers it is the smallest convenient chunk of storage. For example, most computers don’t have an instruction to move a bit but do have one to move a byte. A less common term is word, which is a given computer architecture’s native unit of data. A word is made up of one or more bytes. For example, a computer that has 64-bit registers and 64- bit memory addressing typically has 64-bit (8-byte) words. A computer executes many operations in its native word size rather than a byte at a time. Computer storage, along with most computer throughput, is generally measured and manipulated in bytes and collections of bytes. A kilobyte, or KB, is 1,024 bytes a megabyte, or MB, is 1,024 2 bytes a gigabyte, or GB, is 1,024 3 bytes a terabyte, or TB, is 1,024 4 bytes a petabyte, or PB, is 1,024 5 bytes Computer manufacturers often round off these numbers and say that a megabyte is 1 million bytes and a gigabyte is 1 billion bytes. Networking measurements are an exception to this general rule; they are given in bits (because networks move data a bit at a time)


A group of 8 bits is called a byte ( with the exception where it is not :) for certain architectures )

A word is a fixed sized group of bits that are handled as a unit by the instruction set and/or hardware of the processor. That means the size of a general purpose register ( which is generally more than a byte ) is a word

In the C, a word is most often called an integer => int


In this context, a word is the unit that a machine uses when working with memory. For example, on a 32 bit machine, the word is 32 bits long and on a 64 bit is 64 bits long. The word size determines the address space.

In programming (C/C++), the word is typically represented by the int_ptr type, which has the same length as a pointer, this way abstracting these details.

Some APIs might confuse you though, such as Win32 API, because it has types such as WORD (16 bits) and DWORD (32 bits). The reason is that the API was initially targeting 16 bit machines, then was ported to 32 bit machines, then to 64 bit machines. To store a pointer, you can use INT_PTR. More details here and here.


The terms of BYTE and WORD are relative to the size of the processor that is being referred to. The most common processors are/were 8 bit, 16 bit, 32 bit or 64 bit. These are the WORD lengths of the processor. Actually half of a WORD is a BYTE, whatever the numerical length is. Ready for this, half of a BYTE is a NIBBLE.


A word is the size of the registers in the processor. This means processor instructions like, add, mul, etc are on word-sized inputs.

But most modern architectures have memory that is addressable in 8-bit chunks, so it is convenient to use the word "byte".


BYTE

I am trying to answer this question from C++ perspective.

The C++ standard defines ‘byte’ as “Addressable unit of data large enough to hold any member of the basic character set of the execution environment.”

What this means is that the byte consists of at least enough adjacent bits to accommodate the basic character set for the implementation. That is, the number of possible values must equal or exceed the number of distinct characters. In the United States, the basic character sets are usually the ASCII and EBCDIC sets, each of which can be accommodated by 8 bits. Hence it is guaranteed that a byte will have at least 8 bits.

In other words, a byte is the amount of memory required to store a single character.

If you want to verify ‘number of bits’ in your C++ implementation, check the file ‘limits.h’. It should have an entry like below.

#define CHAR_BIT      8         /* number of bits in a char */

WORD

A Word is defined as specific number of bits which can be processed together (i.e. in one attempt) by the machine/system. Alternatively, we can say that Word defines the amount of data that can be transferred between CPU and RAM in a single operation.

The hardware registers in a computer machine are word sized. The Word size also defines the largest possible memory address (each memory address points to a byte sized memory).

Note – In C++ programs, the memory addresses points to a byte of memory and not to a word.


If a machine is byte-addressable and a word is the smallest unit that can be addressed on memory then I guess a word would be a byte!


What I don't understand is what's the point of having a byte? Why not say 8 bits?

Apart from the technical point that a byte isn't necessarily 8 bits, the reasons for having a term is simple human nature:

  • economy of effort (aka laziness) - it is easier to say "byte" rather than "eight bits"

  • tribalism - groups of people like to use jargon / a private language to set them apart from others.

Just go with the flow. You are not going to change 50+ years of accumulated IT terminology and cultural baggage by complaining about it.


FWIW - the correct term to use when you mean "8 bits independent of the hardware architecture" is "octet".


In fact, in common usage, word has become synonymous with 16 bits, much like byte has with 8 bits. Can get a little confusing since the "word size" on a 32-bit CPU is 32-bits, but when talking about a word of data, one would mean 16-bits. Microcontrollers with a 32-bit word size have taken to calling their instructions "longs" (supposedly to try and avoid the word/doubleword confusion).


Examples related to assembly

Why does C++ code for testing the Collatz conjecture run faster than hand-written assembly? While, Do While, For loops in Assembly Language (emu8086) Replacing a 32-bit loop counter with 64-bit introduces crazy performance deviations with _mm_popcnt_u64 on Intel CPUs How to run a program without an operating system? Difference between "move" and "li" in MIPS assembly language Carry Flag, Auxiliary Flag and Overflow Flag in Assembly How do AX, AH, AL map onto EAX? JNZ & CMP Assembly Instructions Difference between JE/JNE and JZ/JNZ The point of test %eax %eax

Examples related to memory

How does the "view" method work in PyTorch? How do I release memory used by a pandas dataframe? How to solve the memory error in Python Docker error : no space left on device Default Xmxsize in Java 8 (max heap size) How to set Apache Spark Executor memory What is the best way to add a value to an array in state How do I read a large csv file with pandas? How to clear variables in ipython? Error occurred during initialization of VM Could not reserve enough space for object heap Could not create the Java virtual machine

Examples related to hardware

Difference between nVidia Quadro and Geforce cards? What's the difference between a word and byte? Which Android phones out there do have a gyroscope? What are the ascii values of up down left right? How to fast get Hardware-ID in C#? Can I write or modify data on an RFID tag?

Examples related to terminology

The differences between initialize, define, declare a variable What is the difference between a web API and a web service? What does "opt" mean (as in the "opt" directory)? Is it an abbreviation? What's the name for hyphen-separated case? What is Bit Masking? What is ADT? (Abstract Data Type) What exactly are iterator, iterable, and iteration? What is a web service endpoint? What is the difference between Cloud, Grid and Cluster? How to explain callbacks in plain english? How are they different from calling one function from another function?

Examples related to cpu-architecture

Write-back vs Write-Through caching? Undefined symbols for architecture x86_64 on Xcode 6.1 Difference between core and processor What is difference between sjlj vs dwarf vs seh? how much memory can be accessed by a 32 bit machine? What's the difference between a word and byte? Difference between x86, x32, and x64 architectures? What is the difference between Trap and Interrupt?