With the recent public release of Ghidra I felt like reverse engineering something, and as I have a personal interest in reading VNs (or visual novels), I decided to RE some VN engine. This post contains a rough outline of some of the RE steps I took to end up with pylivemaker, a toolkit for manipulating and patching LiveMaker/LiveNovel game resources.
About VNs
VNs as a medium originated in Japan, with NScripter and Kirikiri and being the most common (to my knowledge) engines for Japanese VNs. In the west, Ren’Py is also a popular option. VN engines generally work by parsing some script format, which will contain a novel’s text as well as commands to display images and play audio at the appropriate times alongside the novel text.
An engine’s script format may also include some kind of support for branching paths based on a player’s gameplay choices, as in a choose-your-own-adventure style book (ex. Ren’Py script). In most cases, a VN’s game files will simply consist of the original script files in a plaintext format, any resource images/audio files/etc. These files are packed into some (possibly proprietary) archive format, and then distributed along with an interpreter executable, which can unpack the archive, and parse/execute the script files.
For the purposes of this exercise, my goal was to RE some engine to the point that I could extract a VN’s text script from the game file(s), modify the script, and re-insert it into the original game files via a patch. This type of patching is frequently done for the purpose of releasing unofficial translations of Japanese VNs (which until relatively recently, would rarely get official releases outside of Japan). I also wanted to select something obscure enough that tools for patching my engine did not already exist (for example, KrkrExtract for Kirikiri). As a result, I selected LiveMaker.
About LiveMaker/LiveNovel
LiveMaker is a (now defunct) Japanese engine/toolkit for making PC games, and one of the tools included with LiveMaker, LiveNovel, is for making ADV (adventure) style VNs. LiveNovel provides a GUI interface to make VNs without requiring any actual scripting by the user (aside from entering the actual novel text). LiveMaker was available for free, but there was also a paid version that added additional features, most notably the ability for a user to encrypt their resources to prevent artwork/audio/etc. from being extracted from a game.
Note: Although the company that developed LiveMaker no longer exists, the most recent free version of LiveMaker can still be downloaded through archive.org, or from here.
Running LiveMaker (and LiveNovel) requires setting your Windows non-unicode system locale to Japanese. The LiveMaker applications cannot be run via LocaleEmulator. The applications themselves will run with LocaleEmulator, but building/testing a LiveMaker or LiveNovel project will fail unless the actual system locale is set properly.
For reference, the LiveNovel tutorial and documentation which is bundled with LiveMaker is also available here (in Japanese). Some of the example LiveMaker/LiveNovel images in this post are taken from the tutorial docs.
The LiveNovel GUI provides a visual flow-chart for managing scenario branching and control-flow, as well as a pseudo-WYSIWYG editor for writing the actual novel text and inserting images/audio/etc.For users that wish to write more traditional VN scripts, LiveNovel also supports what they call an “HTML-like” scripting language.
When a user provides a script in the “HTML-like” format, it is converted into the same internal LiveNovel text format (.lns
) used by the WYSIWYG editor.
Once a LiveNovel project is completed and ready to be distributed, the user can export their game as either a traditional Windows installer, or as a standalone executable.
When distributing via an installer, the game files will be packed into an archive (.dat
) separate from the interpreter executable.
When distributing via a standalone executable, the game archive is appended to the end of the interpreter executable.
When building the actual game archive, the internal text format is compiled into a binary file format (.lsb
) that can be read by the interpreter executable.
Meaning that when the game files are unpacked, the original plaintext script is unavailable.
So in order to translate and patch a LiveNovel game, a potential tool would have to be able to:
- Unpack a game archive (or standalone .exe)
- Convert the binary .lsb format into some human readable format to allow script editing
- Convert the edited script back into binary .lsb format
- Insert the new .lsb file back into the original game archive (or standalone .exe)
At the time I started this exercise, I could not find any examples of tools for patching LiveMaker games, although one toolkit named irl had been kickstarted but eventually abandoned.
GARbro, a popular tool for browsing and extracting the contents of many VN archive formats, does include support for LiveMaker archives and LiveMaker’s proprietary image format (.gal
), but it does not support doing anything with .lsb scripts, or repacking an archive.
RE Notes
Before getting into details about LiveMaker internals, here are some things to note:
- LiveMaker (including both the game-making applications and the game interpreters) is a Delphi program
- I am not a Delphi or Pascal developer, so most of the terms I use will be from the standpoint of a C developer, with object oriented stuff from C++ and/or Python (i.e. I might refer to a class function table as a vtable, but apparently the appropriate Delphi term would be VMT?)
- This is also applies to my Ghidra project where a lot of types/variables that I added are not named in PascalCase, and where I use
this
a lot even though I am not sure if thethis
pointer is the correct term in Delphi.
- This is also applies to my Ghidra project where a lot of types/variables that I added are not named in PascalCase, and where I use
- Importing the proper class/type/symbol information can be done by exporting from IDR (Interactive Delphi Reconstructor) to IDA Pro, and then from IDA Pro to Ghidra
- Ghidra identifies LiveMaker applications properly as a borlanddelphi executables with some caveats.
- Ghidra does not do anything with the class/type/symbol information the the Delphi .rsrc section.
- The Ghidra decompiler mostly works correctly on delphi functions, except when Delphi exception handling is involved.
- Based on my very limited understanding of Delphi exception handling, Ghidra decompiles an individual
try
block as it’s own function, so if a variable exists outside the scope of thetry
block, the Ghidra decompiler does not know what to do with it.
- A link to my Ghidra project can be found on the pylivemaker wiki
LiveMaker VF Archive Format
Since examples for unpacking LiveMaker .dat archives and standalone .exe files already exist, I did not need to spend too much RE time on extracting archives, although there were a few issues that I needed to sort out for packing archives.
The LiveMaker VF archive format is a packed structure with the following format:
# all ints are little-endian
Directory Header:
16-bit "vf" signature
32-bit int version (VF archive format version, not LiveMaker version)
32-bit int count
Filenames (array with length <count> entries):
32-bit int prefixed pascal strings (CP932 (MS Shift-JIS) encoded)
Offsets (array with length <file_count> + 1):
32-bit int offset_low
32-bit int offset_high
Compression flags (array with length <file_count>)
8-bit uint compression method
Unknown list (array length <file_count>)
32-bit int unk1
Checksums - (array length <file_count>)
32-bit int checksum
Encryption flags 0 if not encrypted (array length <file_count>)
8-bit uint encrypt_flag
File data
...
If the archive is appended to the end of a standalone exe, the file data will be followed by:
Trailer:
32-bit int offset (points to start of archive directory "vf")
16-bit "lv" signature
LiveMaker archive offsets are actually 32-bit unsigned integers, but LiveMaker stores them across two signed integer fields.
I am assuming that this is related to their internal integer types only being signed, but regardless of the reason, the offset_low
field contains the least-significant 31-bits of the offset, and the offset_high
field contains 0
if the most significant offset bit is 0
, or 0xffffffff
if the most significant offset bit is 1
.
In certain LiveMaker versions, the filenames and offsets fields in the archive directory are obfuscated by XORing the data with a fixed keystream.
The keystream comes from LiveMaker’s TTpRandom
PRNG class (which generates a stream of 32-bit integers), initialized with a fixed seed value.
Where this + 8
is the last key value and this + 4
is the original seed value.
Python implementation of the keystream generator:
def keystream(seed=LIVEMAKER3_XOR_SEED):
key = 0
while True:
key = ((key << 2) + key + seed) & 0xffffffff
yield key
Note: prior to this exercise, everything in the directory header up to and including the compression flags
array (including the XOR obfuscation) was already handled properly by both irl and GARbro, with the exception of the offset_high
field, which both projects ignore as an unused field (it is unused in VF archive versions < 101).
For repacking a (patched) archive, I needed to RE LiveMaker’s method for generating file checksums, since in both irl and GARbro, the checksum field is ignored entirely. In LiveMaker, save data is stored in the same VF archive format as the game data. By following the function calls used to write to a save file, I eventually was able to identify the function used to checksum a byte stream (in this case, the byte stream containing the save file data).
In this case, the function reads some amount of data from a [file] stream into a buffer, and then checksums the data in the buffer.
This continues until the entire stream has been read.
However, for this function, the actual return value is stored in a stack variable (at address EBP - 0xc
) which exists outside of the Delphi equivalent of a try/catch
block, and the Ghidra decompiler does not recognize it as an actual stack variable.
The checksum calculation itself happens in the loop which Ghidra decompiles as
if (-1 < b_high + -1) {
do {
local_EAX_185 = local_EAX_185 + -1;
} while (local_EAX_185 != 0);
}
Since the returned checksum value outside of the Delphi exception handling, the Ghidra decompiler only recognizes the loop counter local_EAX_185
.
But if we inspect the assembly for that loop, we can see:
0019dffe 7c 24 JL LAB_0019e024
0019e000 40 INC b_high
0019e001 8a 5d f8 MOV BL,byte ptr [EBP + checksum]
0019e004 32 1a XOR BL,byte ptr [b_high_00]
0019e006 81 e3 ff AND EBX,0xff
00 00 00
0019e00c 8b 35 b8 MOV ESI,dword ptr [->gvar_vfChecksumKeys]
e0 1c 00
0019e012 8b 1c 9e MOV EBX,dword ptr [ESI + EBX*0x4]
0019e015 8b 75 f8 MOV ESI,dword ptr [EBP + checksum]
0019e018 c1 ee 08 SHR ESI,0x8
0019e01b 33 de XOR EBX,ESI
0019e01d 89 5d f8 MOV dword ptr [EBP + checksum],EBX
0019e020 42 INC b_high_00
0019e021 48 DEC local_EAX_185
0019e022 75 dd JNZ LAB_0019e001
gvar_vfChecksumKeys
is a hard-coded array of integers, and is consistent across LiveMaker versions.
checksum
is EBP + -0xc
, and is initialized to 0xffffffff
at the start of the function (not pictured in screenshot).
checksum
is also XOR’d with 0xffffffff
before the final value is returned (also not pictured).
In Python, the actual checksum calculation can be written as:
def checksum(data):
csum = 0xffffffff
for c in data:
x = (csum & 0xff) ^ c
x = VF_CHECKSUM_KEYS[x]
csum = (csum >> 8) ^ x
return csum ^ 0xffffffff
LiveMaker/LiveNovel LSB format
Upon extracting a LiveNovel VN, the first thing you will notice is that the archive contains mostly .lsb files.
AGLS ❯ ll
total 6.4M
-rw-r--r-- 1 pmrowla staff 64K Mar 20 16:37 00000001.lsb
-rw-r--r-- 1 pmrowla staff 356K Mar 29 21:59 00000623.lsb
-rw-r--r-- 1 pmrowla staff 522K Mar 20 16:37 000006FB.lsb
-rw-r--r-- 1 pmrowla staff 489K Mar 20 16:37 000006FD.lsb
-rw-r--r-- 1 pmrowla staff 344K Mar 20 16:37 000006FF.lsb
-rw-r--r-- 1 pmrowla staff 901K Mar 20 16:37 00000732.lsb
-rw-r--r-- 1 pmrowla staff 610K Mar 20 16:37 00000734.lsb
-rw-r--r-- 1 pmrowla staff 1.6M Mar 20 16:37 00000736.lsb
-rw-r--r-- 1 pmrowla staff 495K Mar 20 16:37 00000782.lsb
-rw-r--r-- 1 pmrowla staff 42K Mar 20 16:37 00000784.lsb
-rw-r--r-- 1 pmrowla staff 330 Mar 20 16:37 INSTALL.DAT
-rw-r--r-- 1 pmrowla staff 3.1K Mar 20 16:37 TOP_MENU.lpm
-rw-r--r-- 1 pmrowla staff 1.7K Mar 20 16:37 live.lpb
drwxr-xr-x 30 pmrowla staff 960 Mar 20 16:37 サウンド
-rw-r--r-- 1 pmrowla staff 3.3K Mar 20 16:37 シーン回想.lsb
-rw-r--r-- 1 pmrowla staff 2.2K Mar 20 16:37 変数初期化.lsb
drwxr-xr-x 70 pmrowla staff 2.2K Mar 20 16:37 グラフィック
drwxr-xr-x 36 pmrowla staff 1.2K Mar 20 16:37 ノベルシステム
-rw-r--r-- 1 pmrowla staff 3.0K Mar 20 16:37 ゲームメイン.lsb
-rw-r--r-- 1 pmrowla staff 30K Mar 20 16:37 メッセージボックス作成.lsb
-rw-r--r-- 1 pmrowla staff 4.3K Mar 20 16:37 メッセージボックス座標.lsb
The files named numerically contain the binary versions of the novel’s user-generated “chart” scripts.
The files named in Japanese are LiveMaker/LiveNovel system scripts (ex. ゲームメイン.lsb
translates to gamemain.lsb
) (although they are not fixed per specific LiveMaker versions, since they may be modified depending on user LiveMaker project settings).
If you examine a LiveNovel project directory before it is exported (by following the tutorial for example), you will see that each output .lsb file has an equivalent .lsc file. Depending on the LiveMaker version, the .lsc file may be in an XML format or a text format, but the end result is that they contain a series of script commands which are then compiled into the binary .lsb format.
<Item Command="Call" Indent="0" Mute="0" NotUpdate="1" Color="0">
<Page>変数初期化.lsc</Page>
<Result />
<Calc>TRUE</Calc>
<Params />
</Item>
In this example, we can see that this is a “Call” command, which calls (executes) the specified script file.
Each possible command type has an equivalent internal class type, which are all subclasses of TCommand
.
By following the function calls made when opening and parsing an .lsb file, we can reverse engineer the .lsb format, and see how the XML (or text) .lsc version of a command is compiled into the binary .lsb version.
Note: prior to this exercise, irl already included basic functionality for parsing an older version of LSB files and dumping information about command types, but it was not documented at all from an RE standpoint, and did not run correctly on the sample LSB I used for RE’ing the engine.
In the above function, we can see the fields in the LSB file header being read, and then a loop which reads one command at a time from the file, based on a 1-byte long command_type
field.
createTcomFromType()
contains a switch statement, to construct and return an instance of the appropriate TCom… class for this type value.
At this point, we can determine that the format for an LSB file is as follows:
# all ints little-endian
32-bit int version (LSB format version, not LiveMaker version)
8-bit int flags
32-bit int command_count (total number of commands in this file)
32-bit int param_stream_size (size of command param stream in bytes)
<command_params> (array of bytestreams containing bitflags for each possible command type which specify whether or not a given command accepts a specific parameter type)
<commands> (list with length command_count where each command has a variable length, but starts with a 1-byte int type field)
After constructing an instance of the appropriate command class, the parser for the given command is then called.
In the example above, the second parameter (stream
) is a pointer to a Delphi TStream
subclass instance.
*(this) + 4
refers to a function pointer in the vtable for TStream
(or a subclass’s vtable), and in this case, the function pointer at offset 4
is TStream->Read()
.
Note: TStream-Read()
itself is declared as abstract, but the actual TFileStream
/TCustomMemoryStream
/etc. subclass instance passed into a parser will have a defined Read()
method.
From here, we can see that the parsing function for a TComElse
command reads a 4-byte field, 1-byte field, 1-byte field, 4-byte field (in that order) into class instance variables.
As such, we can determine that the structure of a TComElse
command stored in the binary LSB file would be:
struct {
char command_type;
int32_t;
char;
char;
int32_t;
}
Some command parsers call other command parsers:
Meaning that a TComIf
command structure starts with the same fields as TComElse
, and then contains an additional parameter list field.
Note: TLiveParser
is an internal LiveMaker class for storing and evaluating expressions (i.e. “a + b”) which can refer to game variables, but the implementation details are not particularly relevant to this post.
Identifying what each field does by examining function calls for each command type would be the obvious next thing to do, but since there are ~60 command types I am leaving out the details from this post. If you are really curious, you can refer to the pylivemaker docs for specifics on each possible LiveMaker command type and their relevant fields.
For parsing the actual novel script contents, TComTextIns
commands represent a scenario text block from a LiveNovel chart.
Each TComTextIns
command contains a TpWord
block which is a binary compiled version of the internal LiveNovel .lns scenario script format.
A TpWord
block contains an array of TpWd
subclasses which represent either individual text characters, or specific (HTML-like) tags from LiveNovel’s internal script format.
In order to RE the compiled scenario script format, you would simply follow the same steps as taken to RE the binary .lsb format, replacing the individual TCom… parsing functions with the individual TpWd… parsing functions (the end result of which is documented here).
Patching
Once all of the RE legwork was done to identify the necessary binary formats, all that was left was to writing tools to do the following:
- Decompile a binary TpWord scenario script back into a human readable script (i.e. with the (HTML-like) tags from LiveNovel’s script format)
- Compile a decompiled and edited/translated script back into a binary TpWord block
- Replace the original TpWord scenario script from an LSB file with a newly edited and (re)compiled one
- Patch the modified LSB file back into a game archive