Update (2/11 9:30): Due to a mistake on our side the API was not responding correctly till the 1st of November at 11:00, when we fixed it. We apologize and postpone the deadline by one day. More details are in the discussion forum.
The aim of this task is to access the contents of a secret file. The secret file is stored on our server. You can interact with the server by sending it commands through an API. However, the commands need to be authenticated with a custom hash-based authentication scheme. To access the secret file you will need to break this authentication scheme and its implementation. The authentication is based on Merkle-Dåmgard construction. The hash-function has all the properties (strengths and weaknesses, such as the Length extension attack) you would expect from such a MD based function.
You will be interacting with the server through the API described below. The API lets you send Unix-like commands which it then executes on the server. To prevent an execution of potentially malicious commands, the API expects a valid signature. A hash-based authentication function is used to calculate the signatures. To authorize a command (i.e., to obtain a valid signature) use the Authorize endpoint. Then to run the command use the Run endpoint. The Run endpoint recognizes two commands: ls and cat. These are Unix-like commands to list and concatenate (or view) files. However, the commands on the server are only simple variants of the Unix commands and do not support any options (or flags). If you are familiar with the Unix ls and cat commands think of our variants as being invoked as ls <filename> or cat <filename> (again, no flags are supported).
At some point, you will need to work directly with the state of the hash-function used on the server, so we provide its implementation below. This function has more weaknesses than a well-made Merkle-Dåmgard hash-function, but you will not need to use those to solve this task.
You can download the code of the hash-function here: merkled_amgard.py. You can also download a minimal working example of interacting with the endpoints in Python here: mwe.py. We suggest you to start from the mwe.py, especially if you are not familiar with how the HTTP protocol works, i.e., how clients and a server exchange messages with each other. You can also have a look into the documentation for the Requests Python module. For now, we intentionally digress from how the query string should be constructed for the GET requests. As understanding this is crucial, we will return to this at the end of the assignment.
Now, we present some common functions used in the server code and the API endpoints that will process requests you send to the server. We suggest you to open the mwe.py file now and try to see how the client code (i.e., in the mwe.py) and the server code (presented here) interact with each other. You don't need the complete server code and therefore we provide only the necessary parts. In case you wonder, the server is implemented in the Flask web framework. Understanding the following functions is crucial in figuring out the attack.
# The authentication_prefix is a randomly generated 32 byte fixed secret value
authentication_prefix = b"..."
len(authentication_prefix) == 32
def verify(data, signature):
"""Verify that `signature` is valid for `data` by re-computing it and comparing."""
if signature != MerkledAmgard(authentication_prefix + data).hexdigest():
raise ValueError('Invalid signature')
def sign(data):
"""Sign `data` by prepending the authentication_prefix and hashing the whole thing."""
return MerkledAmgard(authentication_prefix + data).hexdigest()
def parse(query_string):
"""
Parse query string and also get it in bytes.
Returns two objects:
- First a dictionary with parsed query string.
Multiple values are overridden (e.g., "?cmd=ls&cmd=aaaa" -> {"cmd": "aaaa"})
- Secondly the raw query string unquoted into bytes.
(i.e., unquotes raw bytes: %00 -> b'\x00')
"""
return dict(parse_qsl(query_string, errors='ignore')), unquote_to_bytes(query_string)
And now, let's see the actual API endpoints. For example, when you execute the authorize
function inside the mwe.py, the following authorize API endpoint will get called on the server
with the values that you pass to it via the query string. If there is an error on the server-side, you will get an error response such as
'No command specified' and the HTTP status code 400.
Run an authorized command.
Depending on the command either a list of files for the ls command or the contents of a file for the cat command.
@hw02.route("/run/<int:uco>/")
def run(uco: int):
args, decoded = parse(request.query_string)
signature = request.cookies.get('signature')
# Verify that the signature is valid
if not signature:
return jsonify({'error': 'Missing signature'}), 403
try:
verify(decoded, signature)
except ValueError as error:
return jsonify({'error': 'Invalid signature', 'detail': str(error)}), 403
# Verify that the signature is not expired
expiry = float(args.get(b'expiry'))
if datetime.now(timezone.utc).timestamp() >= expiry:
return jsonify({'error': 'Signature has expired'}), 403
# Get the command and execute it (if it's "ls" or "cat")
command = args.get(b'cmd')
if command == b"ls":
return list_files(uco)
elif command.startswith(b"cat "):
fname = command.split(b" ")[1].decode()
return cat_file(uco, fname)
else:
return jsonify({'error': 'Unknown command'}), 403
The API expects some parameters as part of the query string;
that is, within the URL itself. In general, passing any values between programs/services suffers from
the need of correct interpretation of the data being sent by both sides (the sending and
the receiving one). ASCII
characters are often handled with ease, but sending special characters (where special depends on the
context) and raw bytes can
be troublesome. As an example, imagine sender's intention to send the zero byte 0x00, but
the receiver might interpret it as four characters 0, x, 0, 0. Another
example, the following comparison
in Python bytearray(b'\x00')[0] == 0x00
evaluates to true, but to send
the zero byte as part of URL
you need to encode it differently and send %00. While those differences
could seem nitpicky,
they really aren't. You can read more about URL encoding on Wikipedia and
about query strings
in general.
Going back to the mwe.py you might notice that we intentionally do not use the recommended way from the docs to send the query parameters using the params keyword argument. Since you are free to pick a different language be careful and pay attention to how the query string is actually created. For Python, have a look at the functions unquote_to_bytes, quote_from_bytes we import in mwe.py.
You should submit a zip-file containing three, optionally four (including the llm.txt), things:
The recovery of the correct contents of the secret file is worth 7 points, with the description worth the remaining 3 points. However, a submission with just the contents of the secret file and no description is worth 0 points. If you don't complete the task, submit a description of where you got stuck and the code you used. Not conforming to the above format of the solution leads to a -0.5 point penalty.