Unicode is an agreed-upon format for the binary representation of characters and various kinds of formatting (e.g. lower case/upper case, new line, carriage return), and other "things" (e.g. emojis). A computer is no less capable of storing a unicode representation (a series of bits), whether in memory or in a file, than it is of storing an ascii representation (a different series of bits), or any other representation (series of bits).
For communication to take place, the parties to the communication must agree on what representation will be used.
Because unicode seeks to represent all the possible characters (and other "things") used in inter-human and inter-computer communication, it requires a greater number of bits for the representation of many characters (or things) than other systems of representation that seek to represent a more limited set of characters/things. To "simplify," and perhaps to accommodate historical usage, unicode representation is almost exclusively converted to some other system of representation (e.g. ascii) for the purpose of storing characters in files.
It is not the case that unicode cannot be used for storing characters in files, or transmitting them through any communications channel, simply that it is not.
The term "string," is not precisely defined. "String," in its common usage, refers to a set of characters/things. In a computer, those characters may be stored in any one of many different bit-by-bit representations. A "byte string" is a set of characters stored using a representation that uses eight bits (eight bits being referred to as a byte). Since, these days, computers use the unicode system (characters represented by a variable number of bytes) to store characters in memory, and byte strings (characters represented by single bytes) to store characters to files, a conversion must be used before characters represented in memory will be moved into storage in files.