The profiler trace puts it into perspective.
Query B is using parallelism: CPU > duration eg the query uses 2 CPUs, average 1.15 secs each
Query A is probably not: CPU < duration
This explains cost relative to batch: 17% of the for the simpler, non-parallel query plan.
The optimiser works out that query B is more expensive and will benefit from parallelism, even though it takes extra effort to do so.
Remember though, that query B uses 100% of 2 CPUS (so 50% for 4 CPUs) for one second or so. Query A uses 100% of a single CPU for 1.5 seconds.
The peak for query A is lower, at the expense of increased duration. With one user, who cares? With 100, perhaps it makes a difference...