feat: add certificate metrics to agent for NGINXaaS#1731
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1731 +/- ##
==========================================
- Coverage 84.88% 84.64% -0.24%
==========================================
Files 105 111 +6
Lines 13632 13897 +265
==========================================
+ Hits 11571 11763 +192
- Misses 1538 1602 +64
- Partials 523 532 +9
Continue to review full report in Codecov by Harness.
🚀 New features to boost your workflow:
|
|
|
||
| status: | ||
| class: receiver | ||
| stability: |
There was a problem hiding this comment.
Note: the mdatagen schema requires stability to be defined at both the receiver level (status.stability: beta: [metrics]) and the metric level (stability.level: development)
e563b15 to
4ba54d3
Compare
| } | ||
|
|
||
| for _, path := range c.cfg.CertFilePaths { | ||
| cert, err := parseCertFile(path) |
There was a problem hiding this comment.
A couple issues to think about here:
- there are potentially a lot of certs and parsing them can be non-trivial work to do every 15s.
- A path may contain more than one certificate
Do we have any notification mechanism for when c.cfg changes?
Maybe something for going through all the filepaths to extract all the certs. keep a list of all the certs with the data we need for each one (expiration, path, pubkeyalgo, serial, etc) as well as the file's mtime.
Then for each scrape we just iterate through that list and stat the file to see if it has changed and we need to reparse.
There was a problem hiding this comment.
Made some changes to the scraper to address your feedback:
- there are potentially a lot of certs and parsing them can be non-trivial work to do every 15s.
Added an mtime-based cache. Each scrape does os.Stat per file; if mtime is unchanged we skip the read+parse and use cached certs.
- A path may contain more than one certificate
parseCertFile now loops pem.Decode until exhausted instead of stopping at the first block. Each cert gets its own data point.
4ba54d3 to
f7f48ee
Compare
f7f48ee to
9b7c983
Compare
Required by mdatagen for nginxplusreceiver, nginxreceiver, and containermetricsreceiver metrics. No behaviour change.
Add metadata.yaml defining nginx.certificate.expiry (gauge, Unix timestamp) with attributes file_path, public_key_algorithm, serial_number, subject.common_name. Add CertificateReceiver config type with InstanceID and CertFilePaths []string.
Run: cd internal/collector/certificatereceiver && mdatagen metadata.yaml
Scraper reads cert files via crypto/x509 on each 15s scrape and emits nginx.certificate.expiry (Unix timestamp) per cert — renewals are picked up immediately without a collector restart. Gated on FeatureCertificates. Collector restarts only when the set of watched cert file paths changes.
9b7c983 to
54501f1
Compare
NGINXAAS-1315: Certificate expiry metric receiver
Motivation
As a platform engineer managing NGINXaaS deployments, I want to be alerted before a certificate expires. This alert should come from the same monitoring stack, following existing metrics patterns, and the metric labels should help identify which cert is the problem: common name, file path, algorithm, serial number.
nginx-agent already indexes every certificate nginx is using as part of config parsing. This change makes that data useful by exporting it as a metric, giving operators a simple threshold alert on
nginx.certificate.expirywithout any additional tooling.The receiver is separate from the existing nginx/nginxplus receivers because it covers a distinct concern (TLS hygiene vs. traffic metrics), it can emit a lot of data points on cert-heavy deployments, and it should be easy to enable or disable independently.
Implementation
Adds a
certificateOTel receiver that scrapes cert files viacrypto/x509every 15s and emitsnginx.certificate.expiry, a gauge of the Unix timestamp at which each cert expires.Attributes:
file_path,public_key_algorithm,serial_number,subject.common_nameResource attribute:
instance.id| Gated on:FeatureCertificatesRenewals (same path, new cert) are reflected on the next scrape without a collector restart. The collector only restarts when the set of watched paths changes.
Commit Descriptions
3793b300e8a6bfdfc308fce0e4ecChecklist
Before creating a PR, run through this checklist and mark each as complete.
CONTRIBUTINGdocumentmake install-toolsand have attached any dependency changes to this pull requestREADME.md)