Standardized protocol #31

foxycode · 2020-03-07T17:06:21Z

It would be nice to have standardized and mainly documented protocol.
Sending JSON instead of raw data won't hurt too. If New Relic can do it, why not you?
Things like disk health should be evaluated in the script, not on your backend, so anyone can do own implementation.

foxycode · 2020-03-07T17:39:08Z

I made a mistake, I was missing base64 after system update and that was why script wasn't working, sorry. Still, it would be nice to have standardized protocol.

hetrixtools · 2020-03-09T09:57:17Z

Hello,

I can confirm that the back-end code for each version does not change when newer agent versions are released, meaning that all of our older agents are still fully compatible and working.

We'll work on a standardized protocol for future agent versions.

Thanks for the feedback.

sholwe · 2020-08-09T19:32:05Z

Can we get a documented API? I'd like to extend this outside of Linux as well. It was trivial to make work with Alpine, but some of this is kind of Linuxish to support BSDs. I'd prefer to wait for something documented before writing a compatible posting tool.

Thanks!

foxycode · 2020-08-09T21:17:10Z

I'd like documented API too. Right now, some features like RAID health are parsed and processed on backend, which isn't ideal state.

hetrixtools · 2020-08-10T08:36:38Z

Hello,

We'll be working on a standardized protocol for our agent in a future release, along with documentation info regarding this.

Thank you for the feedback.

foxycode · 2020-08-10T09:08:45Z

Do you have any release date?

hetrixtools · 2020-08-10T09:56:55Z

Do you have any release date?

Unfortunately not at this time.

sholwe · 2020-08-17T01:18:26Z

Most of this was done by hand, since I can't always use Linuxisms. This works for version 1.59; I've implemented (most of) it for OpenBSD.

POSTDATA="v=$VERSION&s=$SID&d=$OS|$Uptime|$CPUModel|$CPUSpeed|$CPUCores|$CPU|$IOW|$RAMSize|$RAM|$SwapSize|$Swap|$DISKs|$NICS|$ServiceStatusString|$RAID|$DH|$RPS1|$RPS2|$IOPS|$CONN|$DISKi"

v= current version string - 1.59 (may be decimal 2 precision)
s= Local system string hash (Site ID)
d= String [see below, all terminated with pipes]
OS (b)= String - Shortname or $(uname -s)$(uname -r)"|"$(uname -r)"|"RequiresReboot INT (1 true or 0)
Uptime = seconds since boot
CPUModel (b) = string
CPUSpeed (b) = speed of CPU (int)
CPUCores= int number of cores
CPU = Average of CPUSpeed for post period
IOW = IOWait decimal 2 precision
RAMSize = Complete RAM size (MB)
RAM = used RAM (MB) in percentage
SwapSize = Total (MB)
Swap = Used (MB) in percentage 
NICS (gb) = (array) "|"interface";"inbytes";"outbytes";""|"interface";"inbytes";"outbytes";'... 
DISKs (gb) =  (array) mount point, totalsize (bytes), available(bytes)
RAID (gb) = {{have no implemented}}
DH (gb) = {{have not implemented}} (array) {lsblk name"|{smartctl -H}|"...}
RPS1 = unimplemented
RPS2 = unimplemented
IOPS (gb) =  {{have not implemented}}
CONN (b) = (array) "PortNumber"|"NumberOfConnectionsToPort";"
DISKi (gb) = (array) mountpoint, total inodes, used inodes, available inodes";"

(g) noted is encoded to post with: gzip -cf
(b) noted as base64 encoded with base64prep() (in script)

Yes, this is really brief, and an enormous mess. The biggest issue I ran into is with their "base64prep" function which is nonstandard as well - it just changes things to post without bring escaped by the webservice. "+" is converted to "%2B" and "\" is rewritten to "%2F" - kind of a mini htmlspecialchars().

The way the script gets average network data is one of the most bizarre things I've ever seen to date. It makes an array and loops several times to increment over the period of time that it expects to run (roughly a minute). Since I can rely on getting pretty normalized data over a period of time, I take a snapshot when it first runs, then count the bytes sent/received before I have the script echo roughly 52 seconds later. Still a cheat, but accurate enough for a 0.01 release.

hetrixtools · 2020-08-17T09:11:46Z

@sholwe thank you for putting in the time to write all of this down.

We know that the agent data aggregation is quite messy at this time, the person who coded it did not do it justice; however, the collected stats are on par with many other tested tools.

We'll work on a standardized protocol, along with more code cleanup/optimization, in the next major agent release version.

Thanks again for your time and effort.

sholwe · 2020-08-17T14:41:24Z

Hi @hetrixtools -

As @foxycode has stated, your service seems to take much of this raw data and decide what to do with it when it's parsed on your end. That means we'll need to adapt any specific information for the RAID, etc, and hope that it's handled correctly. Can we get a basic post system for you to store and aggregate without whatever logic is being used there?

Thanks - when I clean it up, I'll submit my OBSD code to you; I haven't got a FreeBSD box at the moment, but since it's primarily sysctl/netstat based, shouldn't take much effort.

foxycode · 2020-08-17T15:03:32Z

Since it's relevant, I'll add link to my SmartOS/Solaris fork: https://github.com/sunfoxcz/hetrixtools-agent-smartos/tree/smartos

sholwe · 2020-08-17T23:28:41Z

Yaay! I miss Solaris. 2.6 5/98 will forever be in my heartworms. Here's an OpenBSD "functional" version.

https://github.com/sholwe/hetrixtools-agent-openbsd

foxycode · 2020-08-17T23:35:52Z

@hetrixtools Maybe add forks links to repository README would be nice?

hetrixtools · 2020-08-18T09:49:06Z

@foxycode added.

Thank you everyone for your contributions.

foxycode · 2022-02-17T01:58:42Z

@hetrixtools Any progress with standardized protocol? My agent implementation won't show SMART status after upgrading to last SmartOS version and I once again don't have idea why and can't debug thing.

sholwe · 2022-02-20T20:20:05Z

@foxycode It's going to be here-

if [ "$CheckDriveHealth" -gt 0 ]
then
if [ -x "$(command -v smartctl)" ] #Using S.M.A.R.T. (for regular HDD/SSD)
then
for i in $(diskinfo -cH | grep -v "??R" | awk '{ print $2 }')
do
DHealth=$(smartctl -A /dev/rdsk/$i)
if grep -q 'Attribute' <<< $DHealth
then
DHealth=$(smartctl -H /dev/rdsk/$i)"\n$DHealth"
DH="$DH|1\n$i\n$DHealth\n"
fi
done
fi
if [ -x "$(command -v nvme)" ] #Using nvme-cli (for NVMe)
then
for i in $(lsblk -l | grep 'disk' | awk '{ print $1 }')
do
DHealth=$(nvme smart-log /dev/$i)
if grep -q 'NVME' <<< $DHealth
then
if [ -x "$(command -v smartctl)" ]
then
DHealth=$(smartctl -H /dev/${i%??})"\n$DHealth"
fi
DH="$DH|2\n$i\n$DHealth\n"
fi
done
fi
fi

I'm afraid I haven't touched SmartOS in ages. Check to see if smartctl has been deprecated or the format has changed for the output. You can still use my above reverse engineered POST data to roll your own.

foxycode · 2022-02-20T20:53:58Z

@sholwe I already fixed it, but problem is, that smartctl output is analyzed on hextrixtools side, which is bad concept. Noone can implement it's own disk check. If you don't have proper smartctl on you machine, you have bad luck.

sholwe · 2022-02-20T20:55:57Z

Yikes. I noticed they were doing this with other data back for the 1.59 release. I saw you were based on 1.58, but wasn't sure what might have been changed.

foxycode changed the title ~~Compatibility between versions~~ Standardized protocol Mar 7, 2020

hetrixtools added the suggestion label Mar 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Standardized protocol #31

Standardized protocol #31

foxycode commented Mar 7, 2020 •

edited

Loading

foxycode commented Mar 7, 2020

hetrixtools commented Mar 9, 2020

sholwe commented Aug 9, 2020

foxycode commented Aug 9, 2020

hetrixtools commented Aug 10, 2020

foxycode commented Aug 10, 2020

hetrixtools commented Aug 10, 2020

sholwe commented Aug 17, 2020 •

edited

Loading

hetrixtools commented Aug 17, 2020

sholwe commented Aug 17, 2020

foxycode commented Aug 17, 2020

sholwe commented Aug 17, 2020

foxycode commented Aug 17, 2020

hetrixtools commented Aug 18, 2020

foxycode commented Feb 17, 2022

sholwe commented Feb 20, 2022

foxycode commented Feb 20, 2022

sholwe commented Feb 20, 2022

Standardized protocol #31

Standardized protocol #31

Comments

foxycode commented Mar 7, 2020 • edited Loading

foxycode commented Mar 7, 2020

hetrixtools commented Mar 9, 2020

sholwe commented Aug 9, 2020

foxycode commented Aug 9, 2020

hetrixtools commented Aug 10, 2020

foxycode commented Aug 10, 2020

hetrixtools commented Aug 10, 2020

sholwe commented Aug 17, 2020 • edited Loading

hetrixtools commented Aug 17, 2020

sholwe commented Aug 17, 2020

foxycode commented Aug 17, 2020

sholwe commented Aug 17, 2020

foxycode commented Aug 17, 2020

hetrixtools commented Aug 18, 2020

foxycode commented Feb 17, 2022

sholwe commented Feb 20, 2022

foxycode commented Feb 20, 2022

sholwe commented Feb 20, 2022

foxycode commented Mar 7, 2020 •

edited

Loading

sholwe commented Aug 17, 2020 •

edited

Loading