Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardized protocol #31

Open
foxycode opened this issue Mar 7, 2020 · 18 comments
Open

Standardized protocol #31

foxycode opened this issue Mar 7, 2020 · 18 comments

Comments

@foxycode
Copy link
Contributor

foxycode commented Mar 7, 2020

  • It would be nice to have standardized and mainly documented protocol.
  • Sending JSON instead of raw data won't hurt too. If New Relic can do it, why not you?
  • Things like disk health should be evaluated in the script, not on your backend, so anyone can do own implementation.
@foxycode
Copy link
Contributor Author

foxycode commented Mar 7, 2020

I made a mistake, I was missing base64 after system update and that was why script wasn't working, sorry. Still, it would be nice to have standardized protocol.

@foxycode foxycode changed the title Compatibility between versions Standardized protocol Mar 7, 2020
@hetrixtools
Copy link
Owner

Hello,

I can confirm that the back-end code for each version does not change when newer agent versions are released, meaning that all of our older agents are still fully compatible and working.

We'll work on a standardized protocol for future agent versions.

Thanks for the feedback.

@sholwe
Copy link

sholwe commented Aug 9, 2020

Can we get a documented API? I'd like to extend this outside of Linux as well. It was trivial to make work with Alpine, but some of this is kind of Linuxish to support BSDs. I'd prefer to wait for something documented before writing a compatible posting tool.

Thanks!

@foxycode
Copy link
Contributor Author

foxycode commented Aug 9, 2020

I'd like documented API too. Right now, some features like RAID health are parsed and processed on backend, which isn't ideal state.

@hetrixtools
Copy link
Owner

Hello,

We'll be working on a standardized protocol for our agent in a future release, along with documentation info regarding this.

Thank you for the feedback.

@foxycode
Copy link
Contributor Author

Do you have any release date?

@hetrixtools
Copy link
Owner

Do you have any release date?

Unfortunately not at this time.

@sholwe
Copy link

sholwe commented Aug 17, 2020

Most of this was done by hand, since I can't always use Linuxisms. This works for version 1.59; I've implemented (most of) it for OpenBSD.

POSTDATA="v=$VERSION&s=$SID&d=$OS|$Uptime|$CPUModel|$CPUSpeed|$CPUCores|$CPU|$IOW|$RAMSize|$RAM|$SwapSize|$Swap|$DISKs|$NICS|$ServiceStatusString|$RAID|$DH|$RPS1|$RPS2|$IOPS|$CONN|$DISKi"

v= current version string - 1.59 (may be decimal 2 precision)
s= Local system string hash (Site ID)
d= String [see below, all terminated with pipes]
OS (b)= String - Shortname or $(uname -s)$(uname -r)"|"$(uname -r)"|"RequiresReboot INT (1 true or 0)
Uptime = seconds since boot
CPUModel (b) = string
CPUSpeed (b) = speed of CPU (int)
CPUCores= int number of cores
CPU = Average of CPUSpeed for post period
IOW = IOWait decimal 2 precision
RAMSize = Complete RAM size (MB)
RAM = used RAM (MB) in percentage
SwapSize = Total (MB)
Swap = Used (MB) in percentage 
NICS (gb) = (array) "|"interface";"inbytes";"outbytes";""|"interface";"inbytes";"outbytes";'... 
DISKs (gb) =  (array) mount point, totalsize (bytes), available(bytes)
RAID (gb) = {{have no implemented}}
DH (gb) = {{have not implemented}} (array) {lsblk name"|{smartctl -H}|"...}
RPS1 = unimplemented
RPS2 = unimplemented
IOPS (gb) =  {{have not implemented}}
CONN (b) = (array) "PortNumber"|"NumberOfConnectionsToPort";"
DISKi (gb) = (array) mountpoint, total inodes, used inodes, available inodes";"

(g) noted is encoded to post with: gzip -cf
(b) noted as base64 encoded with base64prep() (in script)

Yes, this is really brief, and an enormous mess. The biggest issue I ran into is with their "base64prep" function which is nonstandard as well - it just changes things to post without bring escaped by the webservice. "+" is converted to "%2B" and "\" is rewritten to "%2F" - kind of a mini htmlspecialchars().

The way the script gets average network data is one of the most bizarre things I've ever seen to date. It makes an array and loops several times to increment over the period of time that it expects to run (roughly a minute). Since I can rely on getting pretty normalized data over a period of time, I take a snapshot when it first runs, then count the bytes sent/received before I have the script echo roughly 52 seconds later. Still a cheat, but accurate enough for a 0.01 release.

@hetrixtools
Copy link
Owner

@sholwe thank you for putting in the time to write all of this down.

We know that the agent data aggregation is quite messy at this time, the person who coded it did not do it justice; however, the collected stats are on par with many other tested tools.

We'll work on a standardized protocol, along with more code cleanup/optimization, in the next major agent release version.

Thanks again for your time and effort.

@sholwe
Copy link

sholwe commented Aug 17, 2020

Hi @hetrixtools -

As @foxycode has stated, your service seems to take much of this raw data and decide what to do with it when it's parsed on your end. That means we'll need to adapt any specific information for the RAID, etc, and hope that it's handled correctly. Can we get a basic post system for you to store and aggregate without whatever logic is being used there?

Thanks - when I clean it up, I'll submit my OBSD code to you; I haven't got a FreeBSD box at the moment, but since it's primarily sysctl/netstat based, shouldn't take much effort.

@foxycode
Copy link
Contributor Author

Since it's relevant, I'll add link to my SmartOS/Solaris fork: https://github.com/sunfoxcz/hetrixtools-agent-smartos/tree/smartos

@sholwe
Copy link

sholwe commented Aug 17, 2020

Yaay! I miss Solaris. 2.6 5/98 will forever be in my heartworms. Here's an OpenBSD "functional" version.

https://github.com/sholwe/hetrixtools-agent-openbsd

@foxycode
Copy link
Contributor Author

@hetrixtools Maybe add forks links to repository README would be nice?

@hetrixtools
Copy link
Owner

@foxycode added.

Thank you everyone for your contributions.

@foxycode
Copy link
Contributor Author

@hetrixtools Any progress with standardized protocol? My agent implementation won't show SMART status after upgrading to last SmartOS version and I once again don't have idea why and can't debug thing.

@sholwe
Copy link

sholwe commented Feb 20, 2022

@foxycode It's going to be here-

if [ "$CheckDriveHealth" -gt 0 ]
then
if [ -x "$(command -v smartctl)" ] #Using S.M.A.R.T. (for regular HDD/SSD)
then
for i in $(diskinfo -cH | grep -v "??R" | awk '{ print $2 }')
do
DHealth=$(smartctl -A /dev/rdsk/$i)
if grep -q 'Attribute' <<< $DHealth
then
DHealth=$(smartctl -H /dev/rdsk/$i)"\n$DHealth"
DH="$DH|1\n$i\n$DHealth\n"
fi
done
fi
if [ -x "$(command -v nvme)" ] #Using nvme-cli (for NVMe)
then
for i in $(lsblk -l | grep 'disk' | awk '{ print $1 }')
do
DHealth=$(nvme smart-log /dev/$i)
if grep -q 'NVME' <<< $DHealth
then
if [ -x "$(command -v smartctl)" ]
then
DHealth=$(smartctl -H /dev/${i%??})"\n$DHealth"
fi
DH="$DH|2\n$i\n$DHealth\n"
fi
done
fi
fi

I'm afraid I haven't touched SmartOS in ages. Check to see if smartctl has been deprecated or the format has changed for the output. You can still use my above reverse engineered POST data to roll your own.

@foxycode
Copy link
Contributor Author

@sholwe I already fixed it, but problem is, that smartctl output is analyzed on hextrixtools side, which is bad concept. Noone can implement it's own disk check. If you don't have proper smartctl on you machine, you have bad luck.

@sholwe
Copy link

sholwe commented Feb 20, 2022

Yikes. I noticed they were doing this with other data back for the 1.59 release. I saw you were based on 1.58, but wasn't sure what might have been changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants