Mike,
The capability that you are asking for does not exist, at least not within the LA HW. The data from a single physical external channel is, for the most part, connected directly to a single physical internal bit location. This means that if you only have 3 physical wires connected to your target, then approximately 95% (65/68) of the LA memory will not be used.
There are some things you can do in State mode, with some buses, using the Master/Slave or Demux modes to improve the situation by a factor of two. In Timing mode, you can go into half-channel/double-depth mode.
I had a very large customer who asked for this capability a few years ago. They only needed 1->4 fanout (16 -> 64 channels). It required an external box with some special programming, and only worked with the 16900 family of analyzers and cards. In their case, they were using the 256M cards, and this gave them 1G samples of data per capture. It was a pretty expensive solution.
As you note, besides the HW changes necessary to accommodate this, a number of SW changes would be required to reconstruct the data into a single stream. There are some tools that could do this on the 16900 Family, and, in theory, you could do this on a 16700 also using the optional SW development environment (whose name escapes me just now).
Depending on exactly what your signals look like, there might be another solution. Advanced Logical Devices
http://www.ald.com has an external box for connecting to I2C, SPI and RS-232. It converts the serial bitstream to parallel, and passes it to the LA that way. On at least one of the buses, they have a decoder. If your bitstream looks like one of the buses they support, you might be able to get much better memory utilization out of your LA.
Al