Wednesday, July 27, 2016

Unraveling Samsung PC Studio 3's SMS Database

Back in the days of flip phone popularity, Samsung was a very popular name. My first phone was a Samsung SGT-T339 (T-Mobile exclusive). This got me through middle school, so as you can imagine the text message functionality was used a lot. Sagas of he said/she said nonsense and other various 7th grade nonsense. I was not above it by any stretch of the imagination. Being as I was (and still am) addicted to archiving everything, I used a program called Samsung PC Studio 3 to unload my text messages when it got full. The database was in some nonstandard format. I wanted to convert it to something I could open in excel, like CSV. So it's time to reverse engineer!

I hope that this process will help with people facing similar deconstruction ambitions. This was a whole lot of fun to do.

I wanted to automate the process as best as I could, so I wrote a C program to deal with the internals of the file. Basically, the program reads certain parts of the file into memory locations where shorts and doubles and character arrays live.

Figuring out how this worked was easy but took awhile to figure out why some things were breaking. Basically, what you have to do is go through byte by byte with your favorite hex editor and look for values that you know exist in the database. For example, finding the strings were easy. They were characters with nulls separating them (or maybe they were just shorts for support for unicode?) preceded by a char (which was followed by a short if the char was 255) determining the length of the string. You also have to look for numbers that are different for each record. For example, in this context, a 1 meant incoming and 2 meant outgoing. Also, look for recurring patterns of bytes, as they may be field or record boundaries.

The date was a bit harder to deal with. I looked forever for something that looked like a timestamp to find it. But then I found a double randomly in the record. It turns out that there's a datetime format called OLE Datetime that basically counts the number of days since midnight, December 30, 1899. I say days since midnight because you can have fractional days to indicate time. Here's a quick description of the format from Microsoft.

This may sound confusing, but look at the spec I tried to write (there are some holes). It gives you a pretty decent idea about how the file is saved. The first record was a little weird so I just dealt with that manually.

Then the program I wrote is simply a loop with a bunch of fread and fseek calls and then writes a CSV file. Take a look at the GitHub repository to see how it works. I wrote it with nothing special, it should compile with anything that has the standard C libraries. I haven't tried it on a Windows computer yet, but it should work.

I hope this little program is able to retrieve your long lost text messages. It worked for me fairly well and was cool to write. I always think that things should be saved in an open format so you can open the file with other things just in case the original program becomes defunct. But I understand the argument against that.

No comments:

Post a Comment