{"id":1018,"date":"2022-02-01T02:06:56","date_gmt":"2022-02-01T07:06:56","guid":{"rendered":"https:\/\/www.brunerd.com\/blog\/?p=1018"},"modified":"2022-02-02T16:05:12","modified_gmt":"2022-02-02T21:05:12","slug":"jpt-1-0-text-encoding-fun","status":"publish","type":"post","link":"https:\/\/www.brunerd.com\/blog\/2022\/02\/01\/jpt-1-0-text-encoding-fun\/","title":{"rendered":"jpt 1.0 text encoding fun"},"content":{"rendered":"\n<p>Besides JSON, <a rel=\"noreferrer noopener\" href=\"http:\/\/github.com\/brunerd\/jpt\" target=\"_blank\">jpt<\/a> (the <em>JSON<\/em> power tool) can also output strings and numbers in a variety of encodings that the sysadmin or programmer might find useful. Let&#8217;s look at the encoding options from the output of <code>jpt -h<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>% jpt -h\n...\n-T textual output of all data (omits property names and indices)\n  Options:\n\t-e Print escaped characters literally: \\b \\f \\n \\r \\t \\v and \\\\ (escapes formats only)\n\t-i \"&lt;value&gt;\" indent spaces (0-10) or character string for each level of indent\n\t-n Print null values as the string 'null' (pre-encoding)\n\n\t-E \"&lt;value&gt;\" encoding options for -T output:\n\n\t  Encodes string characters below 0x20 and above 0x7E with pass-through for all else:\n\t\tx \t\"\\x\" prefixed hexadecimal UTF-8 strings\n\t\tO \t\"\\nnn\" style octal for UTF-8 strings\n\t\t0 \t\"\\0nnn\" style octal for UTF-8 strings\n\t\tu \t\"\\u\" prefixed Unicode for UTF-16 strings\n\t\tU \t\"\\U \"prefixed Unicode Code Point strings\n\t\tE \t\"\\u{...}\" prefixed ES2016 Unicode Code Point strings\n\t\tW \t\"%nn\" Web encoded UTF-8 string using encodeURI (respects scheme and domain of URL)\n\t\tw \t\"%nn\" Web encoded UTF-8 string using encodeURIComponent (encodes all components URL)\n\n\t\t  -A encodes ALL characters\n\t\n\t  Encodes both strings and numbers with pass-through for all else:\n\t\th \t\"0x\" prefixed lowercase hexadecimal, UTF-8 strings\n\t\tH \t\"0x\" prefixed uppercase hexadecimal, UTF-8 strings\n\t\to \t\"0o\" prefixed octal, UTF-8 strings\n\t\t6 \t\"0b\" prefixed binary, 16 bit _ spaced numbers and UTF-16 strings\n\t\tB \t\"0b\" prefixed binary, 8 bit _ spaced numbers and UTF-16 strings\n\t\tb \t\"0b\" prefixed binary, 8 bit _ spaced numbers and UTF-8 strings\n\n\t\t  -U whitespace is left untouched (not encoded)<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"strings\">Strings<\/h2>\n\n\n\n<p>While the above conversion modes will do both number and string types, these options will work <em>only<\/em> on strings (numbers and booleans pass-through). If you work with with shell scripts these techniques may be useful.<\/p>\n\n\n\n<p>If you store shell scripts in a database that&#8217;s not using <code>utf8mb4<\/code> table and column encodings then you won&#8217;t be able to include snazzy emojis to catch your user&#8217;s attention! In fact this WordPress install was so old (almost 15 years!) the default encoding was still latin1_swedish_ci, which an odd but surprisingly common default for many old blogs. Also if you store your scripts in Jamf (still in v10.35 as of this writing) it uses <code>latin1<\/code> encoding and your 4 byte characters will get mangled. Below you can see in Jamf things look good while editing, fails once saved, and the eventual workaround is to use an coding like <code>\\x<\/code> escaped hex (octal is an alternate)<\/p>\n\n\n\n<figure class=\"wp-block-gallery has-nested-images columns-default is-cropped wp-block-gallery-1 is-layout-flex wp-block-gallery-is-layout-flex\">\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/1EmojiScript-2.png\"><img loading=\"lazy\" decoding=\"async\" width=\"115\" height=\"50\" data-id=\"1038\" src=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/1EmojiScript-2.png\" alt=\"\" class=\"wp-image-1038\"\/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/2EmojiScript-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"131\" height=\"61\" data-id=\"1036\" src=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/2EmojiScript-1.png\" alt=\"\" class=\"wp-image-1036\"\/><\/a><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/3EmojiScript-1.png\"><img loading=\"lazy\" decoding=\"async\" width=\"215\" height=\"64\" data-id=\"1035\" src=\"https:\/\/www.brunerd.com\/blog\/wp-content\/uploads\/3EmojiScript-1.png\" alt=\"\" class=\"wp-image-1035\"\/><\/a><\/figure>\n<figcaption class=\"blocks-gallery-caption\">Left to Right: Put 4 byte Unicode in Jamf, Saved and mangled, \\x escaping workaround<\/figcaption><\/figure>\n\n\n\n<p>Let&#8217;s use the red &#8220;octagonal sign&#8221; emoji, which is a stop sign to most everyone around the world, with the exception of Japan and Libya (thanks Google image search). Let&#8217;s look at some of the way \ud83d\uded1 can be encoded in a shell script<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#reliable \\x hex notation for bash and zsh\n% jpt -STEx &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\xf0\\x9f\\x9b\\x91\n\n#above string can be  in both bash and zsh\n% echo $'Alert \\xf0\\x9f\\x9b\\x91'\nAlert \ud83d\uded1\n\n#also reliable, \\nnn octal notation\n% jpt -STEO &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\360\\237\\233\\221\n\n#works in both bash and zsh\n% echo $'Alert \\360\\237\\233\\221'\nAlert \ud83d\uded1\n\n#\\0nnn octal notation\n% jpt -STE0 &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\0360\\0237\\0233\\0221\n\n#use with shell builtin echo -e and ALWAYS in double quotes\n#zsh does NOT require -e but bash DOES, safest to use -e\n% echo -e \"Alert \\0360\\0237\\0233\\0221\"\nAlert \ud83d\uded1\n\n#-EU code point for zsh only\n% jpt -STEU &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\U0001f6d1\n\n#use in C-style quotes in zsh\n% echo $'Alert \\U0001f6d1'\nAlert \ud83d\uded1<\/code><\/pre>\n\n\n\n<p>The <code>-w\/-W<\/code> flags can encode characters for use in URLs<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#web\/percent encoded output in the case of non-URLs -W and -w are the same\n% jpt -STEW &lt;&lt;&lt; \ud83d\uded1  \n%F0%9F%9B%91\n\n#-W URL example (encodeURI)\njpt -STEW &lt;&lt;&lt; http:\/\/site.local\/page.php?wow=\ud83d\uded1\nhttp:&#47;&#47;site.local\/page.php?wow=%F0%9F%9B%91\n\n#-w will encode everything (encodeURIComponent)\n% jpt -STEw &lt;&lt;&lt; http:\/\/site.local\/page.php?wow=\ud83d\uded1\nhttp%3A%2F%2Fsite.local%2Fpage.php%3Fwow%3D%F0%9F%9B%91<\/code><\/pre>\n\n\n\n<p>And a couple other oddballs&#8230;<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#text output -T (no quotes), -Eu for \\u encoding\n#not so useful for the shell scripter\n#zsh CANNOT handle multi-byte \\u character pairs\n% jpt -S -T -Eu &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\ud83d\\uded1\n\n#-EE for an Javascript ES2016 style code point\n% jpt -STEE &lt;&lt;&lt; \"Alert \ud83d\uded1\"\nAlert \\u{1f6d1}<\/code><\/pre>\n\n\n\n<p>You can also <code>\\u<\/code> encode all characters above 0x7E in JSON with the <code>-u<\/code> flag<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#JSON output (not using -T)\n% jpt &lt;&lt;&lt; '\"Alert \ud83d\uded1\"'\n\"Alert \ud83d\uded1\"\n\n#use -S to treat input as a string without requiring hard \" quotes enclosing\n% jpt -S &lt;&lt;&lt; 'Alert \ud83d\uded1'\n\"Alert \ud83d\uded1\"\n\n#use -u for JSON output to encode any character above 0x7E\n% jpt -Su &lt;&lt;&lt; 'Alert \ud83d\uded1'\n\"Alert \\ud83d\\uded1\"\n\n#this will apply to all strings, key names and values\n% jpt -u &lt;&lt;&lt; '{\"\ud83d\uded1\":\"stop\", \"message\":\"Alert \ud83d\uded1\"}' \n{\n  \"\\ud83d\\uded1\": \"stop\",\n  \"message\": \"Alert \\ud83d\\uded1\"\n}\n<\/code><\/pre>\n\n\n\n<p>Whew! I think I covered them all. If there are newlines, tabs and other invisibles you can choose to output them or leave them encoded when you are outputting to text with <code>-T<\/code><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#JSON in, JSON out\njpt &lt;&lt;&lt; '\"Hello\\n\\tWorld\"'\n\"Hello\\n\\tWorld\"\n\n#ANSI-C string in, -S to treat as string despite lack of \" with JSON out\n% jpt -S &lt;&lt;&lt; $'Hello\\n\\tWorld' \n\"Hello\\n\\tWorld\"\n\n#JSON in, text out: -T alone prints whitespace characters\n% jpt -T &lt;&lt;&lt; '\"Hello\\n\\tWorld\"'\nHello\n\tWorld\n\n#use the -e option with -T to encode whitespace\n% jpt -Te &lt;&lt;&lt; '\"Hello\\n\\tWorld\"'\nHello\\n\\tWorld<\/code><\/pre>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"numbers\">Numbers<\/h2>\n\n\n\n<p>Let&#8217;s start simply with some <strong>numbers<\/strong>. First up is hex notation in the style of <code>0xhh<\/code> and <code>0XHH<\/code>. This encoding has been around since <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ecma-international.org\/wp-content\/uploads\/ECMA-262_1st_edition_june_1997.pdf\" target=\"_blank\">ES1<\/a>, use the <code>-Eh<\/code> and <code>-EH<\/code> respectively to do so. All alternate output (i.e. not JSON) needs the <code>-T<\/code> option. In shell you can combine multiple options\/flags together <em>except<\/em> <em>only the last<\/em> flag can have an argument, like <code>-E<\/code> does below. <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#-EH uppercase hex\n% jpt -TEH &lt;&lt;&lt; &#91;255,256,4095,4096] \n0xFFa\n0x100\n0xFFF\n0x1000\n\n#-Eh lowecase hex\n% jpt -TEh &lt;&lt;&lt; &#91;255,256,4095,4096]\n0xff\n0x100\n0xfff\n0x1000\n<\/code><\/pre>\n\n\n\n<p>Next up are ye olde octals. Use the <code>-Eo<\/code> option to convert numbers to ye olde octals except using the more modern <code>0o<\/code> prefix introduced in <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ecma-international.org\/wp-content\/uploads\/ECMA-262_6th_edition_june_2015.pdf\" target=\"_blank\">ES6<\/a> <\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>-Eo ES6 octals\n% jpt -TEo &lt;&lt;&lt; &#91;255,256,4095,4096]    \n0o377\n0o400\n0o7777\n0o10000<\/code><\/pre>\n\n\n\n<p>Binary notation debuted in the <a rel=\"noreferrer noopener\" href=\"https:\/\/www.ecma-international.org\/wp-content\/uploads\/ECMA-262_6th_edition_june_2015.pdf\" target=\"_blank\">ES6<\/a> spec, it used a <code>0b<\/code> prefix and allows for <code>_<\/code> underscore separators<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>#-E6 16 bit wide binary\n% jpt -TE6 &lt;&lt;&lt; &#91;255,256,4095,4096]\n0b0000000011111111\n0b0000000100000000\n0b0000111111111111\n0b0001000000000000\n\n#-EB 16 bit minimum width with _ separator per 8\n% jpt -TEB &lt;&lt;&lt; &#91;255,256,4095,4096]\n0b00000000_11111111\n0b00000001_00000000\n0b00001111_11111111\n0b00010000_00000000\n\n#-Eb 8 bit minimum width with _ separator per 8\n% jpt -TEb &lt;&lt;&lt; &#91;15,16,255,256,4095,4096]\n0b00001111\n0b00010000\n0b11111111\n0b00000001_00000000\n0b00001111_11111111\n0b00010000_00000000<\/code><\/pre>\n\n\n\n<p>If you need to encode strings or numbers for use in scripting or programming, then <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/brunerd\/jpt\" target=\"_blank\">jpt<\/a> might be a handy utility for you and your Mac and if your *nix has <code>jsc<\/code> then it should work also. Check the <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/brunerd\/jpt\/releases\" target=\"_blank\">jpt Releases<\/a> page for Mac installer package download.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Besides JSON, jpt (the JSON power tool) can also output strings and numbers in a variety of encodings that the sysadmin or programmer might find useful. Let&#8217;s look at the encoding options from the output of jpt -h Strings While the above conversion modes will do both number and string types, these options will work [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[19,46,37,51,12,39],"tags":[],"class_list":["post-1018","post","type-post","status-publish","format-standard","hentry","category-bash","category-jamf","category-jpt","category-json","category-scripting","category-zsh"],"_links":{"self":[{"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/posts\/1018","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/comments?post=1018"}],"version-history":[{"count":17,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/posts\/1018\/revisions"}],"predecessor-version":[{"id":1086,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/posts\/1018\/revisions\/1086"}],"wp:attachment":[{"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/media?parent=1018"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/categories?post=1018"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.brunerd.com\/blog\/wp-json\/wp\/v2\/tags?post=1018"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}