research-school-2024/findings/common-issues-1.md
Ari Archer 6fedbcaf8b
Create findings.
Signed-off-by: Ari Archer <ari@ari.lt>
2024-12-27 05:52:50 +02:00

92 lines
8.9 KiB
Markdown

# Common issues (1)
2024-12-26, I received the code of 4 students, after a quick manual review, I found the following common pitfalls the students faced (TL;DR):
1. Project Configuration and Security:
- Poor configuration of the Flask app: not setting URL scheme, insecure secret keys, settings for session cookies.
- Hardwiring of credentials insecurely, such as static passwords or no password at all.
2. Input Validation:
- Missing date and email input validation
- Possible bug about accessing key w/o prior check on its existence
3. Code Structure and Quality:
- Large functions instead of being modular.
- Functions don't have error handling.
- Poor coding practices, for example, useless loops, loading all results into memory, etc.
4. Session and Authentication:
- Storing the status of being an admin in the session and not in the database.
- Poor session management - examples include not setting lifetime of the session.
5. Cryptographic Practices:
- The use of obsolete hashing algorithms such as MD5 instead of more modern, secure options like SHA2, SHA3, BLAKE2, and Argon2.
- Insecure data storage at rest, that is, not hashing passwords and storing them as plain text in the database.
6. Template Security:
- No output encoding in templates, resulting in XSS vulnerabilities.
7. Coding Standards and Practices:
- Bad code quality (e.g., no type hints, unused imports, bad formatting).
- Use of third-party libraries without mirroring them. Hotlinking.
- Bad configuration, insecure practices due to a lack of validation of user inputs, improper error handling, hardcoded credentials, inefficient code, and old cryptographic algorithms used by the majority of projects.
If you want to see the full email:
<details>
<summary>Click this to expand the email</summary>
```
Hi!
I've reviewed the code of the projects without digging very deep into it or using advanced scanning tools, and I have a couple of critiques. I don't know if you want to share them with students or have them or anything, but here they are.
1. `<...>`
- No proper Flask app configuration; That is, no set preferred URL scheme, a set static secret key over a cryptographically secure one (os.urandom(1024) is what you would use, but they chose a static one like 'MySecretKey'), no session cookie configuration (like SESSION_COOKIE_SAMESITE, _SECURE, and _HTTPONLY). The lack of configuration as well as lack of secure keys leads to a compromise of cookies and ability to forge Flask (session) cookies... Which isn't good of course.
- The hardcoded and insecure credentials raise an eyebrow. They should not be hardcoded and they should be secure rather than having an empty password in the file directly, maybe make use of environment variables?
- Lack of input validation is also an area for concern. For example, immediately trying to access a key in a hashmap (well flask.Form or whatever it is, I guess, but it's most likely a hashmap under the hood) without checking whether it exists (`in` keyword) or at least using an intermediate function like .get() and later checking for `is None`. Or not validating dates, emails, etc.
- The functions can be split out into smaller, simpler parts rather than having one big function that does everything. The cursor of the DB could also be reused which would improve performance of the application and lead to overall better quality of code.
- Lack of error handling everywhere. Nowhere are any errors properly handled or reported which once again can lead to many issues.
- Storing of admin status in session sounds like a HORRIBLE idea. What; Store it in the database! Or at least implement a hybrid approach with periodic syncs.
- I see useless loops which can be avoided by using fetchone() over fetchall(), or by just... Not using loops?
- The use of MD5 is also one area that can and should be improved by using SHA2, SHA3, or BLAKE2 family of hashing functions, if you're extra - Argon2, bcrypt, etc. MD5, by today's standards, is not acceptable due to its relatively high collision probability as well as efficiency and speed of attacks. Overall it's an obsolete hashing algorithm with many better alternatives (such as the aforementioned ones), so why use MD5 out of everything??
- No output encoding is done in the templates, like applying the `escape` filter in the Jinja2 template, which leads to XSS.
- Bad code quality. Generally the python linters in my Vim configuration are crying, like, no type hints, unused imports, use of `from flask import <100 functions>` over just using modules, stray whitespace, useless comments, no code wrapping, bad formatting that doesn't properly abide by PEP8, no base case handling thus leaving large chunks of code under a singular if clause, and bad visual grouping of code. Most of this can be resolved by using tools like Pyflakes, Pyright, Black, Isort, etc. to improve the general code quality and experience. Use blueprints for more modular design.
- Use of 3rd party libraries without mirroring them. I remember this funny story with faker.js where thousands of projects broke just because they didn't mirror the code and just used the off-side jsdlvr or npm CDN repositories, where the author of faker.js got fed up with corporate stealing and abusing the OSS community while he himself was struggling badly. Also, there was this one JS library I can't recall the name of that got hijacked leaving thousands of projects being compromised because, once again, they used off-site mirrors of code. Generally it's a terrible idea to use jsdlvr.net or anything other than mirrors since this impacts the FE security badly.
- Bad session management? The session object is used for authentication, but it doesn't seem to be permanent, so maybe it's smart to set the session lifetime to like 30 minutes?
- Lack of typing in dynamic routes is also bad. The user_id is clearly an integer whereas flask's default type is `string`, maybe instead of `/delete/<user_id>` it'd be better to do `/delete/<int:user_id>`? This would not only add a layer of error checking, but would also make the route more specific and less error prone.
- The `debug=True` can be an area for concern in production, though in prod you'd probably use a production-ready WSGI server like Gunicorn anyway, so guess doesn't really matter anyway.
- Lack of proper status codes (like 4xx) on certain state (like registration failure), even though not impacting security, is sad that it's missing.
- Loading all results into memory is not a good idea since there may be a lot of records, which could lead to problems on the server due to memory being filled. Maybe it's just better to stick to the default generating iterator to fetch results as you need it without using fetchall() to load it all into memory?
Generally, the code doesn't operate on a good security model. If the source code was released - the system would be quickly compromised.
2. `<...>`
Same critiques as #1 apply here as well. In fact, to me the backend code and project structure looks almost the same as project_1 except they use werkzeug.security for password hashing, which is good. It also uses a blueprint for better modularity, I guess.
I believe werkzeug uses scrypt under the hood, which is a positive since it's generally more secure than SHA256 due to its memory-hard nature, yet using lower power consumption.
3. `<...>`
Same critiques as #1 apply here as well, though, I see a few improvements:
1. The DB cursor is reused, allowing for better performance.
2. Proper status codes are used to indicate the client errors.
3. General formatting improvements, leading to better code quality.
4. Use of dictionary=True can lead to less error-prone DB access, which is a positive.
However, there's other major flaws like storage of plain text passwords, which is completely unacceptable. Use password hashing using, for example, Python's built-in hashlib.
4. `<...>`
Once again, many of the critiques of #1 apply here, however, there's other problems I see as well, such as:
1. Use of plain text passwords once here again, which is bad. Use password hashing and mentioned before.
2. Lack of parameterised queries also has huge security implications, and not only can, but will lead to SQL injection.
3. The DB connections are not reused, which has raises performance questions. Connecting to the database every time leads to terrible performance. Let alone not using a context manager.
5. General overview
There are common issues in all the projects, which include poor configuration, insecure practices, missing input validation, and insufficient error handling. Other factors include the presence of credentials hardcoded into the source code, inefficient code, and the use of obsolete cryptographic algorithms. Improvements on these lines will enhance safety, performance, and ease of maintenance. Further, modularization of design, proper error reporting, and adherence to coding standards such as PEP8 would contribute to quality code.
<... irrelevant ...>
```
</details>