If you have a 200,000,000 character files and split that every five characters, you have 40,000,000 String
objects. Assume they are sharing actual character data with the original 400 MB String
(char
is 2 bytes). A String
is say 32 bytes, so that is 1,280,000,000 bytes of String
objects.
(It's probably worth noting that this is very implementation dependent. split
could create entirely strings with entirely new backing char[]
or, OTOH, share some common String
values. Some Java implementations to not use the slicing of char[]
. Some may use a UTF-8-like compact form and give very poor random access times.)
Even assuming longer strings, that's a lot of objects. With that much data, you probably want to work with most of it in compact form like the original (only with indexes). Only convert to objects that which you need. The implementation should be database like (although they traditionally don't handle variable length strings efficiently).