So... you might or might not have noticed the other day that I posted in this thread for about 5 or 10 minutes before I deleted my post. Basically I felt like I was onto something but because I didn't really have much of anything yet, it seemed very awkward and the mounting anxiety compelled me to delete my post until I collected my thoughts for something with marginally more substance.
I figured I might as well make a thread of my own here in the hopes that I might find something and thus illuminate the otherwise seemingly undocumented .samp files, specifically the one in Paper Mario: The Thousand Year Door. It's also to basically ask for help from people who might be better at reading patterns in hex files than me if/when I get stuck
My main problem is that I have no experience with reverse-engineering file formats so this is my first time. As a result, expect me to stumble over myself trying to figure out basic things like hex representations of integers and floats along with basic header patterns! Who knows, maybe I'll learn something from this. That'll be great if I do! 'o'
So in the basics as people figured, there are sound effects in the pmario.samp file located in the TTYD ISO's /sound/proj/ folder. I've confirmed this by importing the pmario.samp file as Raw Data in Audacity. The only thing is that when I've imported the data, there is this awful LOUD, SCREECHY NOISE PERMEATING THE WHOLE FILE SO IT'S NOT USABLE FOR ANYTHING BUT DAMAGING YOUR EARS. On the plus side though, I did noticeably hear Bowser/Mario/Peach's voice clips past the mid point in the noise! Based on other docs I've read, the format might be a custom-ish 4-bit ADPCM format used for other things in the 'cube like streamed audio. Makes sense.
The format I used to import in Audacity to hear the sound files in the horrible screeching fashion was 8-bit PCM, Big Endian, Mono @ 11025Hz.
VOX ADPCM, Big Endian, Mono @ 22050Hz also produced similar results. Obviously these are unusable for actual game purposes but they were valuable for me since they told me that the sounds exist seemingly uncompressed/unscrambled in the file, one after another. It gives me hope that while splitting up the sounds and programmatically naming them might be a horrible endeavor, I can at least theoretically findand steal some of the ADPCM decoding code from a project like vgmstream to use in an unpacker/extractor. I'll do this if I'm going nowhere with the thing I'm trying to do.
Note: The Hex Editor I'm using is HxD which allows me to specify column width for the hex data and the ASCII representation. I'm using a width of 32 instead of the Hex Editor standard of 8 which makes things much clearer for me. I explain things in the wall of text below under this assumption.
Anyway... except for the .db(2) files, there are also a bunch of other files all named "pmario" in the same directory with extensions as follows:
.samp - Our prize right here, 10.7MB and confirmed by myself to contain (PCM-based?) audio samples. It appears to store the instrument samples for sequenced music in other games, according to some posts I've read. Since TTYD uses streamed music though, it just chucks the sound effects of the whole game into it for shits 'n giggles.
I haven't identified any particular structures as of yet but I've noticed that each sample is padded with a varying amount of 0x00 (NULL) which helped me to manually extract a few sounds for testing to confirm suspicions. I haven't confirmed that it's Gamecube 4-bit ADPCM format but I'll do it eventually.
.sdir - Sound directory? It has regularly repeating sections of roughly 32 bytes including a common set of 4 bytes: 0x3C003E80/<.>€ or 0x3C005622/<.V"
Each structure also contains a set of apparently incrementing numbers in 3 places inside this repeating structure that might have something to do with incrementing pointers to data. Perhaps pointing to offsets in the .samp file. There's a huge chunk of seemingly random data past 0x9BE0, but I've noticed a possibly repeating pattern in the form of 0x0008?? (possibly another 4 bytes) every 4 lines in this chunk. I'm still trying to infer any meaning from any of this.
.slib - No idea, but it's 275KB compared to to the interesting pmario.sdir filesize of 87.6KB. Must be important, has some sets of regularly repeating sets of 2 bytes. Might be an image? Not sure, yet.
.pool - I have no idea what this is or what it's for, but I can seemingly recognize a repeating pattern of 64 bytes terminated with 0xFF00 (?) along with some incrementing parts at an offset of 30 bytes from the start of these supposed blocks... but at 0x0400/0x0890/0x0CA0/etc. the block size seems to change for some reason. I doubt I'll spend much time messing with this file.
.hrf - ??? Contains "HRFi" as the first 4 bytes in the file, has 8 bytes of unknown purpose (didn't seem to create any sensical number as 4-byte ints. Maybe they're several shorts??), a null, then the name of the file "pmario.samp". There's a huge chunk of nulls followed by some other bytes and "pmario_samp-0000000001.669". I doubt this will be of any use.
.etbl - Filename table for sound effects. The last byte before a new name seems to increment up to 0xFF and then rolls back to 0x00 for no real reason without changing anything else in the file. I don't get why. There's also junk in the names up to the 30th character that I explain below.
There doesn't seem to be any associated offset or length data in here which makes me think other data might reference it by specifying a fixed index (or the current index of the pmario.samp sound) and multiplying it by 32 (length of each name record in bytes), then reading 30 bytes to get the name of the sound. To get SE3_AMB_RIVER1, (index 3, position 2 starting from 0): 2*32 = 64, or offset of 0x40 - 0x60 (0x40+20, length of record[32] in hex) which corresponds with "SE3_AMB_RIVER1.IO_JUMP2..LING2..". Again, junk is explained below.
.stbl - Other filename table for sound effects? Not sure why there are two files, might be contextual.
On the previously mentioned note, pmario.stbl may come before pmario.etbl judging by the junk in the pmario.etbl file. Also, I figured out what the ".IO_JUMP2..LING2.." junk means... it's basically a null terminator followed by an "after image" of the names that came before it*.
If a subsequent name doesn't take up the 30 characters allowed (it seems the last 2 are 'reserved' for a purpose I haven't identified yet), it simply reprints the last used chars on the next line. In a way I guess it's like if you copied the line you typed to the next line and used Insert mode to overwrite it partially.
Either way, it doesn't really seem to matter because even if the full string with junk is read in to a 30 byte array (not including null terminator, otherwise 31) for the filename, the premature end of the string will be signified by a null character that ends the string before the junk data enters it. If it's the full 30 chars? The array ends naturally. Kinda simple in retrospect and not a major discovery that deserves me being windbaggy about, but knowing it makes me happier at least.
pmario_sound_bgm_txt.db
pmario_sound_env_txt.db
pmario_sound_env_txt.db2
^
These all contain some sort of configuration data for the sounds to be played in-game, including streamed music. It appears to use the tbl filenames as the identifier. As far as I can tell, information stored in these files is effectively worthless from our perspective, aside maybe giving slightly more verbose names for the music tracks, apparently. Everyone just renames those to the appropriate in-game name though so even that information is worthless!
If you wish to examine these files yourself I can make a zip/7z archive of the proj folder and upload it to Mediafire or something. Either way you should be able to get the same files if you have a copy of TTYD (hopefully one you ripped yourself!)
*Example from pmario.etbl, nulls highlighted in red:
Example from pmario.stbl, nulls also highlighted red but with a 30char circled in orange. The 31st char seems to increment almost randomly and this features the 32nd character incrementing pointlessly:
I figured I might as well make a thread of my own here in the hopes that I might find something and thus illuminate the otherwise seemingly undocumented .samp files, specifically the one in Paper Mario: The Thousand Year Door. It's also to basically ask for help from people who might be better at reading patterns in hex files than me if/when I get stuck
My main problem is that I have no experience with reverse-engineering file formats so this is my first time. As a result, expect me to stumble over myself trying to figure out basic things like hex representations of integers and floats along with basic header patterns! Who knows, maybe I'll learn something from this. That'll be great if I do! 'o'
So in the basics as people figured, there are sound effects in the pmario.samp file located in the TTYD ISO's /sound/proj/ folder. I've confirmed this by importing the pmario.samp file as Raw Data in Audacity. The only thing is that when I've imported the data, there is this awful LOUD, SCREECHY NOISE PERMEATING THE WHOLE FILE SO IT'S NOT USABLE FOR ANYTHING BUT DAMAGING YOUR EARS. On the plus side though, I did noticeably hear Bowser/Mario/Peach's voice clips past the mid point in the noise! Based on other docs I've read, the format might be a custom-ish 4-bit ADPCM format used for other things in the 'cube like streamed audio. Makes sense.
The format I used to import in Audacity to hear the sound files in the horrible screeching fashion was 8-bit PCM, Big Endian, Mono @ 11025Hz.
VOX ADPCM, Big Endian, Mono @ 22050Hz also produced similar results. Obviously these are unusable for actual game purposes but they were valuable for me since they told me that the sounds exist seemingly uncompressed/unscrambled in the file, one after another. It gives me hope that while splitting up the sounds and programmatically naming them might be a horrible endeavor, I can at least theoretically find
Note: The Hex Editor I'm using is HxD which allows me to specify column width for the hex data and the ASCII representation. I'm using a width of 32 instead of the Hex Editor standard of 8 which makes things much clearer for me. I explain things in the wall of text below under this assumption.
Anyway... except for the .db(2) files, there are also a bunch of other files all named "pmario" in the same directory with extensions as follows:
.samp - Our prize right here, 10.7MB and confirmed by myself to contain (PCM-based?) audio samples. It appears to store the instrument samples for sequenced music in other games, according to some posts I've read. Since TTYD uses streamed music though, it just chucks the sound effects of the whole game into it for shits 'n giggles.
I haven't identified any particular structures as of yet but I've noticed that each sample is padded with a varying amount of 0x00 (NULL) which helped me to manually extract a few sounds for testing to confirm suspicions. I haven't confirmed that it's Gamecube 4-bit ADPCM format but I'll do it eventually.
.sdir - Sound directory? It has regularly repeating sections of roughly 32 bytes including a common set of 4 bytes: 0x3C003E80/<.>€ or 0x3C005622/<.V"
Each structure also contains a set of apparently incrementing numbers in 3 places inside this repeating structure that might have something to do with incrementing pointers to data. Perhaps pointing to offsets in the .samp file. There's a huge chunk of seemingly random data past 0x9BE0, but I've noticed a possibly repeating pattern in the form of 0x0008?? (possibly another 4 bytes) every 4 lines in this chunk. I'm still trying to infer any meaning from any of this.
.slib - No idea, but it's 275KB compared to to the interesting pmario.sdir filesize of 87.6KB. Must be important, has some sets of regularly repeating sets of 2 bytes. Might be an image? Not sure, yet.
.pool - I have no idea what this is or what it's for, but I can seemingly recognize a repeating pattern of 64 bytes terminated with 0xFF00 (?) along with some incrementing parts at an offset of 30 bytes from the start of these supposed blocks... but at 0x0400/0x0890/0x0CA0/etc. the block size seems to change for some reason. I doubt I'll spend much time messing with this file.
.hrf - ??? Contains "HRFi" as the first 4 bytes in the file, has 8 bytes of unknown purpose (didn't seem to create any sensical number as 4-byte ints. Maybe they're several shorts??), a null, then the name of the file "pmario.samp". There's a huge chunk of nulls followed by some other bytes and "pmario_samp-0000000001.669". I doubt this will be of any use.
.etbl - Filename table for sound effects. The last byte before a new name seems to increment up to 0xFF and then rolls back to 0x00 for no real reason without changing anything else in the file. I don't get why. There's also junk in the names up to the 30th character that I explain below.
There doesn't seem to be any associated offset or length data in here which makes me think other data might reference it by specifying a fixed index (or the current index of the pmario.samp sound) and multiplying it by 32 (length of each name record in bytes), then reading 30 bytes to get the name of the sound. To get SE3_AMB_RIVER1, (index 3, position 2 starting from 0): 2*32 = 64, or offset of 0x40 - 0x60 (0x40+20, length of record[32] in hex) which corresponds with "SE3_AMB_RIVER1.IO_JUMP2..LING2..". Again, junk is explained below.
.stbl - Other filename table for sound effects? Not sure why there are two files, might be contextual.
On the previously mentioned note, pmario.stbl may come before pmario.etbl judging by the junk in the pmario.etbl file. Also, I figured out what the ".IO_JUMP2..LING2.." junk means... it's basically a null terminator followed by an "after image" of the names that came before it*.
If a subsequent name doesn't take up the 30 characters allowed (it seems the last 2 are 'reserved' for a purpose I haven't identified yet), it simply reprints the last used chars on the next line. In a way I guess it's like if you copied the line you typed to the next line and used Insert mode to overwrite it partially.
Either way, it doesn't really seem to matter because even if the full string with junk is read in to a 30 byte array (not including null terminator, otherwise 31) for the filename, the premature end of the string will be signified by a null character that ends the string before the junk data enters it. If it's the full 30 chars? The array ends naturally. Kinda simple in retrospect and not a major discovery that deserves me being windbaggy about, but knowing it makes me happier at least.
pmario_sound_bgm_txt.db
pmario_sound_env_txt.db
pmario_sound_env_txt.db2
^
These all contain some sort of configuration data for the sounds to be played in-game, including streamed music. It appears to use the tbl filenames as the identifier. As far as I can tell, information stored in these files is effectively worthless from our perspective, aside maybe giving slightly more verbose names for the music tracks, apparently. Everyone just renames those to the appropriate in-game name though so even that information is worthless!
If you wish to examine these files yourself I can make a zip/7z archive of the proj folder and upload it to Mediafire or something. Either way you should be able to get the same files if you have a copy of TTYD (hopefully one you ripped yourself!)
*Example from pmario.etbl, nulls highlighted in red:
Example from pmario.stbl, nulls also highlighted red but with a 30char circled in orange. The 31st char seems to increment almost randomly and this features the 32nd character incrementing pointlessly: