Full Blog TOC

Full Blog Table Of Content with Keywords Available HERE

Sunday, July 13, 2025

HTTP Requests General Anonymizer



In this post we will review the steps required to create an HTTP requests anonymizer. 


Anonymizing an HTTP request is required in case we want to store the HTTP requests sent to a site or a web service. We can then use this stored data internally to perform research without exposing the customer Personally Identifiable Information aka PII.

Anonymization of an HTTP request should handle the following items:

  • The request path
  • The user agent header
  • The cookies
  • The source IP
  • General headers treatment

Lets review how to handle the request path as an example that can be later applies to most of the HTTP request items. The request path is compound of multiple sections, for example the path:

/api/v3/store/2345678/item/86767/update-credit/1111-1111-1111-1111

contains multiple sections separated by a slash character. To address anonymization of the path we should treat each section on it own, trying to identify known PII items.

The first PII type to search for is a country ID, for example Brazil ID, US SSN, France ID, etc.
Notice that each country ID has its own structure, its own ignored separator characters, and its own parity number to check. You can easily us pre-made libraries to identify these IDs such as python-stdnum. Alternatively use AI engine such as ChatGPT to create a short code to identify a specific country ID.

Other PII types to search for are:
  • Gender
  • Address
  • Person name
  • Email
  • Credit card
  • Passport number
  • Zip code
  • Phone number
Any of these type has its own method for searching, for example a person name can be identified by holding a list known 100K names. An alternative is using a LLM engine to find PII, but this has much higher costs.

Once we detect a PII we need to replace the PII value with the anonymized value. We can keep some information about the PII value, for example its type and it length. The request path above could be anonymized to:

/api/v3/store/$$$GENERAL_ID$LEN7$$$/item//$$$GENERAL_ID$LEN5$$$/update-credit//$$$CREDIT_CARD$LEN19$$$

Notice that an anonymizer might have both:
  1. False Positive - when a non PII field is anonymized
  2. False Negative - when a PII field is not anonymized

Like in many fields, there a balance between the False Positives and the False Negative. We would generally prefer more False Positives and less False Negatives since the risk in keeping a non anonymized PII value is high.

No comments:

Post a Comment