Alexa or similar without hardware?

Discussion:

(too old to reply)

J. P. Gilliver

2024-04-13 20:51:26 UTC

Is there any software that provides Alexa-like functionality - voice
interaction to play music (free only), answer questions ("how tall is
X", "when was Y born", "who sang z"), set timers/reminders, etc.,
without actually needing a physical object? After all, virtually all
computers these days have sound out, and most laptops at least have a
microphone too. I know most of Alexa's functionality is _not_ in the
little cylinder anyway, despite what they want you to think.

(Or any other "voice assistant", like Siri - or Google?)

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

By most scientific estimates sustained, useful fusion is ten years in
the future - and will be ten years in the future for the next fifty
years or more. - "Hamadryad", ~2016-4-4

VanguardLH

2024-04-13 21:42:42 UTC

Permalink

Post by J. P. Gilliver
Is there any software that provides Alexa-like functionality - voice
interaction to play music (free only), answer questions ("how tall is
X", "when was Y born", "who sang z"), set timers/reminders, etc.,
without actually needing a physical object? After all, virtually all
computers these days have sound out, and most laptops at least have a
microphone too. I know most of Alexa's functionality is _not_ in the
little cylinder anyway, despite what they want you to think.
(Or any other "voice assistant", like Siri - or Google?)

How are you going to do anything with software that does not involve
hardware? You could use "Hey Google" on your smartphone, but then the
smartphone is hardware. You can use a microphone and screen with your
laptop, but obviously hardware in involved. Software can't do anything
without hardware. Please explain just what it is you really want. It
obviously is not the absence of all hardware. It's on WHAT hardware you
want to use the software. What hardware do you want to use?

J. P. Gilliver

2024-04-13 22:26:34 UTC

Permalink

Post by VanguardLH

How are you going to do anything with software that does not involve
hardware? You could use "Hey Google" on your smartphone, but then the

I should have said "dedicated hardware"; I'd hoped that was obvious. I'm
just not sure what I'm _buying_ if I buy an Alexa or dot. Obviously a
speaker and microphone (and the associated amplifiers), but I've already
got those in the laptop. Obviously some firmware and hardware to
"connect to the internet", but again I've already got that.

So when I say "Alexa, how tall is X", or "Alexa, sing me song Y", who is
doing what? There's obviously some speech recognition - and then
language parsing - going on somewhere (and synthesis for the answer),
but I don't think they're happening inside the little cylinder I might
buy; I _think_ the only firmware in that is something that recognises
its name (such as "Alexa"), and it then passes the next second or two of
audio upline to some processing "in the cloud" to be recognised,
understood, and acted upon. I therefore was wondering if there was a
software equivalent. At minimum, all it would have to do is recognise
the opening phrase; it could then pass the rest of the question/request
to the upline firmware, as the dot/Alexa box does, and play back the
returned (could be audio, could be text).

Post by VanguardLH
smartphone is hardware. You can use a microphone and screen with your
laptop, but obviously hardware in involved. Software can't do anything
without hardware. Please explain just what it is you really want. It
obviously is not the absence of all hardware. It's on WHAT hardware you
want to use the software. What hardware do you want to use?

Ideally, I want to use the hardware my computer already has, rather than
buying what seems to be just another microphone and speaker, plus a
little firmware that recognises "Alexa".

My belief of how little there actually is in that box is based on
observation of its behaviour when the internet link to the home is down;
basically it is reduced to saying - for almost any request - "I'm sorry,
I can't do that right now". (I think it may be able to tell me the
time.) It doesn't add "Dave", but it might as well (-:

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

"Address the chair!" "There isn't a chair, there's only a rock!" "Well, call
it a chair!" "Why not call it a rock?" (First series, fit the sixth.)

Paul

2024-04-14 02:09:51 UTC

Permalink

Alexa is buying a microphone array, with phased array capability.
This might allow "focusing" on your voice, improving the SNR.
There might be a Wiki with model information. Or maybe you
can find a picture of a PCB with labels on it.

The single microphone you refer to, a lot of those are "crap",
and if you've ever used speech to text, the software tells
you "no signal detected". That's what happens when you use
the wrong single microphone, plus you put computer fans behind
your head, to make a worse SNR.

If you don't like the Alexa packaging, it's possible there
are third party array microphones with a similar capability.
I've spotted 7 mic arrays and 16 mic arrays. Alexa might
have fewer of them, arranged around the circumference of
the device. (Maybe five microphones, check Wiki)

You want an electret which is biased properly. Some microphones
don't work, because the voltage value used for bias is wrong.
This happens because buying a microphone as a separate item,
nobody knows whether the device spec, matches the bias on
the laptop.

Even the microphone on the built-in webcam in the laptop,
is crap. And it is crap, because it picks up electrical
noise (various tones, the sound of the mouse moving electrically),
and with AGC in usage, the noise is "cranked" and a voice
assistant is going to hate your input.

I have *one* good microphone here. It's an electret that
runs off a separate 5V supply, plus it has a four pin amplifier
after it. It gives line level (~1Vrms) output. It works at a
distance of 2-3 feet from my mouth. But since it is a single
microphone, it cannot remove all the fan noise around me. I could
kill all the fans in here, every last one... but where is the
fun in that.

7 microphones (MEMS type, not electret)

https://www.minidsp.com/products/usb-audio-interface/uma-8-microphone-array

"Step by step app notes for Google SDK, Alexa-Pi, Cortana, Siri, IBM Watson and Matlab. More coming up soon..."

16 microphones (MEMS, description not as featureful (like it is a lab animal for someone)).

https://www.minidsp.com/products/usb-audio-interface/uma-16-microphone-array

Alexa-Pi is an Alexa client that runs on RPi.
Page features a redirect to some other project :-)

Projects like this may require a client-key so
Alexa will actually listen to your input. It's not
a given, just because you have a box that "formats"
the data properly, it will "just work". That's too easy.

https://github.com/alexa-pi

That's even assuming you need Alexa skills for what
you want in the first place. There might be some other,
local means, of implementing your Smart Home.

The seven channel device, should produce one or two channels
of output, as if it was a "naive" microphone. The difference
is, it's done the noise reduction by using the phased array
to pick out speech coming from a point in space. This reduces
the influence of the fans, which are not in the beamform-selected area.

There was even a version of array microphone (shaped like a salad bowl),
used for recording rock groups in a studio. You place it out in front
of the band, and after the recording session is finished
(recording the raw channels), the software can used the phased array
concept to pick out just the drums, just the vocalist,
and "synthesize" tracks as if each player had a discrete microphone
in front of them. You would likely use conventional microphones
anyway, in that studio, so you'd have a fallback if the fancy
method could not isolate one of the sources. Maybe the guitar
pickup would still be conventional for the electric guitar.

The whole idea, is to clean up the audio to the point
any further software is not complaining about the SNR.

Paul

J. P. Gilliver

2024-04-14 09:20:32 UTC

Permalink

In message <uvfdth$3dvi6$***@dont-email.me> at Sat, 13 Apr 2024 22:09:51,
Paul <***@needed.invalid> writes
[]

Post by Paul
Alexa is buying a microphone array, with phased array capability.
This might allow "focusing" on your voice, improving the SNR.
There might be a Wiki with model information. Or maybe you
can find a picture of a PCB with labels on it.

Right, so it's a bit more than just a mike/speaker. But still, I don't
think the functionality we are led to _believe_ is part of it is
contained within that little cylindrical box: the speech recognition,
parsing, and responding is all done "cloud" somewhere. NOT in the box.

Post by Paul
The single microphone you refer to, a lot of those are "crap",

[I was going to say "Mayayana rant deleted" here, until I checked and it
wasn't! Yes, I accept that the laptop built-in mic.s - or even a
_single_ external one - are considerably inferior to the _audio_
processing inside an Alexa or dot.]

Post by Paul
Alexa-Pi is an Alexa client that runs on RPi.
Page features a redirect to some other project :-)

So my _principle_ is correct - that the Alexa box doesn't have
_intelligence_ (other than in the audio processing field, and the
ability to recognise its name).
[]

Post by Paul
That's even assuming you need Alexa skills for what
you want in the first place. There might be some other,
local means, of implementing your Smart Home.

Oh yes - I'm not at this point considering having lighting, temperature,
etc. controlled by it: that involves adding lots of extra hardware to my
home (as it would even if I bought a real Alexa).

I'm probably going to get a dot anyway, as someone locally is selling a
series 3 one; however, I was just wondering - idly, as an intellectual
exercise - what it actually contains, combined with slight resentment
that the less savvy public are led to believe that such devices do more
internally than they actually do (as with many things that are really
online).
[]

Post by Paul
There was even a version of array microphone (shaped like a salad bowl),
used for recording rock groups in a studio. You place it out in front
of the band, and after the recording session is finished
(recording the raw channels), the software can used the phased array
concept to pick out just the drums, just the vocalist,
and "synthesize" tracks as if each player had a discrete microphone
in front of them. You would likely use conventional microphones

Interesting!

Post by Paul
anyway, in that studio, so you'd have a fallback if the fancy
method could not isolate one of the sources. Maybe the guitar
pickup would still be conventional for the electric guitar.

Yes, if only to drive speakers so the rest of the band (and perhaps even
the guitar player) could hear what (s/)he was playing.

Post by Paul
The whole idea, is to clean up the audio to the point
any further software is not complaining about the SNR.
Paul

Yes. For speech-recognition type applications, you probably _don't_ want
AGC, or at least not above a limit so it doesn't blast the recognition
software with background noise when you're not speaking.

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Do ministers do more than lay people?

VanguardLH

2024-04-14 19:42:59 UTC

Permalink

Post by J. P. Gilliver
I'm probably going to get a dot anyway, as someone locally is selling a
series 3 one; however, I was just wondering - idly, as an intellectual
exercise - what it actually contains, ...

Even more at:

https://pallavaggarwal.in/2023/01/10/teardown-amazon-echo-dot-3rd-gen/

Consider what a Raspberry Pi can do. There are PCs made of those
(https://www.raspberrypi.com/products/raspberry-pi-400/, $100). Just
add a monitor.

I'm pretty sure the Alexa doesn't come close, but then Alexa needs to
regulate power, accept microphone input with noise reduction, and
generate audio output to its speaker. The Pi runs a general-purpose OS.
The Alexa has a dedicated OS (modified and trimmed Linux optimized for
the Alexa voice service, Amazon's Vega OS replaces Amazon's Fire OS).

The Pi uses a quad-core. So does the Echo Dot. It's not exactly an
underwhelming hardware setup for the Echo Dot 3rd gen, especially for a
dedicated voice-assist device: Fire OS, 1.3 GHz Cortex A35 quad-core
(https://www.mediatek.com/products/audio/mt8516), 512 MB RAM, 4 GB
storage. For just a couple PCBs inside, it's a decent dedicated device.

https://stratechery.com/2017/amazons-operating-system/
"Amazon’s Operating System" section

https://www.aftvnews.com/amazons-in-house-vega-os-is-already-on-the-new-echo-show-5/

If you feel compelled to have a voice-assisted search tool (and a remote
hub to control smart devices in your home), go for it. Be geeky.

J. P. Gilliver

2024-04-14 21:23:57 UTC

Permalink

Post by VanguardLH

Post by J. P. Gilliver
I'm probably going to get a dot anyway, as someone locally is selling a

Or not. Someone beat me to it!
[]

Post by VanguardLH
The Alexa has a dedicated OS (modified and trimmed Linux optimized for
the Alexa voice service, Amazon's Vega OS replaces Amazon's Fire OS).

I might have guessed the penguin would get in there somewhere (-:!

Post by VanguardLH
The Pi uses a quad-core. So does the Echo Dot. It's not exactly an
underwhelming hardware setup for the Echo Dot 3rd gen, especially for a
dedicated voice-assist device: Fire OS, 1.3 GHz Cortex A35 quad-core
(https://www.mediatek.com/products/audio/mt8516), 512 MB RAM, 4 GB
storage. For just a couple PCBs inside, it's a decent dedicated device.

Right. Does sound like it.
[]

Post by VanguardLH
If you feel compelled to have a voice-assisted search tool (and a remote
hub to control smart devices in your home), go for it. Be geeky.

I may still, though with no urgency. It would only be for, in effect, a
voice google, though - I have no so-called smart devices, and little
intention of getting any. And I've survived without one so far! Probably
not until I see another one for sale at a good price.

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Fortunately radio is a forgiving medium. It hides a multitude of chins ...
Vanessa feltz, RT 2014-3/28-4/4

VanguardLH

2024-04-14 03:28:59 UTC

Permalink

Post by J. P. Gilliver
Ideally, I want to use the hardware my computer already has, rather than
buying what seems to be just another microphone and speaker, plus a
little firmware that recognises "Alexa".

There's the Alexa app in the Microsoft Store you could use on your
Windows 10+ host.

https://apps.microsoft.com/detail/9n12z3cctcnz?activetab=pivot%3Aoverviewtab&hl=en-us&gl=US
(Published by AMZN Mobile LLC.)
(May not be available in your country.)

Looks like there is an unofficial Google Assistant for PC described at:

https://www.lifewire.com/google-assistant-on-windows-4628292
https://github.com/Melvin-Abraham/Google-Assistant-Unofficial-Desktop-Client/

Unless you want to be yelling across the room toward the microphone on
your computer, probably more work to use the assist apps on your desktop
or laptop than just using the web search engines your web browser can
acess while you're at your computer. All this yelling to do what you
could by simply walking over to your computer, to me, only makes sense
if you're handicapped, not because you're super-lazy. If you need to
talk to do searches, or interface with devices, and since many users
seem grafted to their phones carrying them everywhere, even in their
homes, they can do "Hey Google" on their phones, or install the Alexa
app on their phones. Instead of yelling across the room, talk into your
phone.

I've never used Cortana. Disabled it after building my own system as
part of tweaking Windows. Microsoft abandoned Cortana, and came up with
Copilot which requires using Edge-C because of its WebView2 runtime
library which interfaces to Bing Chat.

Of course, when using your own hardware instead of a known functioning
setup, you're the one that has to diagnose and troubleshoot problems
with the setup. Having to be your own sysadmin and tech support is what
draws users to pre-made solutions. They don't want to build and program
their laundry washing machines, either. Users also don't build the
smartphones they use. Someone else came up with a working and reliable
solution, so it sells as a product.

Also consider that using your desktop or laptop requires you leave it
always powered on, and probably no screen lock. It would have to be
consuming power 24x7, and not in hibernate or low-power mode nor powered
off, so it is ready to hear your commands, plus your powered speakers
would also have to be always on. I suspect a voice assist device would
consume a lot less power.

3 watts for Amazon Echo [Dot] (excluding power for external speakers)
2 watts for Google Home
(https://www.esource.com/es-blog-2-17-17-voice-control/ok-google-how-much-energy-does-alexa-consume)

You can use a Kill-A-Watt meter to see how much your desktop or laptop
consumes when it is in full operational mode (no hibernate, low-power).

Then there is you using your desktop or laptop, but someone calls out
some phrase that triggers the voice assist that interrupts your work.
When showering, I really don't want someone entering to get a cup of
water to make hot cocoa. Some things I don't want to share. If you
live alone, not a problem. Be sure to turn it all off when you have
visitors. You don't want family over when someone says "that asshole
cockwit cut me off in traffic" to find your computer showing anal porn
that your Mom sees.

Post by J. P. Gilliver
My belief of how little there actually is in that box is based on
observation of its behaviour when the internet link to the home is down;
basically it is reduced to saying - for almost any request - "I'm sorry,
I can't do that right now". (I think it may be able to tell me the

Well, what can the assistant do when there is no Internet access? It
can't do the searches for you. Those get submitted to some online
search engine. If using smart devices around the home, like a smart
thermostat, access to the control hub service is required, and that's
via Internet. It's not a desktop or laptop computer, just like
smartphones are toy computers compared to desktops. It's not running a
general-purpose OS with the same level of hardware components as a
desktop or laptop, and perhaps not even as robust as a tablet. Adding
all that would raise the price of the voice assistant to the same as
what you pay for your desktop or laptop.

An Alexa Echo costs around $40 device? What was the cost of your
desktop or laptop? Obviously the Alexa is far less robust, but it is a
dedicated device to do what you want on a far more costly platform.

J. P. Gilliver

2024-04-14 09:49:35 UTC

Permalink

In message <1uq6y5vjlloxh$***@v.nguard.lh> at Sat, 13 Apr 2024
22:28:59, VanguardLH <***@nguard.LH> writes
[]

Post by VanguardLH
There's the Alexa app in the Microsoft Store you could use on your
Windows 10+ host.

Ah, right. (Might have guessed only for 10+.)
[]
Might look into that, if I don't get the dot.
[]

Post by VanguardLH
or laptop than just using the web search engines your web browser can
acess while you're at your computer. All this yelling to do what you

It's the "while you're at". Most of the time I _am_ at, so it's not
relevant, but when I've been in other people's houses that do have an
Alexa, it _is_ convenient.

Post by VanguardLH
could by simply walking over to your computer, to me, only makes sense
if you're handicapped, not because you're super-lazy. If you need to
talk to do searches, or interface with devices, and since many users
seem grafted to their phones carrying them everywhere, even in their
homes, they can do "Hey Google" on their phones, or install the Alexa
app on their phones. Instead of yelling across the room, talk into your
phone.

Agreed. I'm also not wedded (in fact don't currently _have_ a
smartphone, only a dumb one [which has the advantage of lasting many
days between charges]).

Post by VanguardLH
I've never used Cortana. Disabled it after building my own system as

Ah, that's the other name I was trying to remember, along with Alexa and
Siri (or Google).
[]

Post by VanguardLH
Of course, when using your own hardware instead of a known functioning
setup, you're the one that has to diagnose and troubleshoot problems
with the setup. Having to be your own sysadmin and tech support is what
draws users to pre-made solutions. They don't want to build and program

Sure.

Post by VanguardLH
their laundry washing machines, either. Users also don't build the
smartphones they use. Someone else came up with a working and reliable
solution, so it sells as a product.
Also consider that using your desktop or laptop requires you leave it
always powered on, and probably no screen lock. It would have to be

Mine is.

Post by VanguardLH
consuming power 24x7, and not in hibernate or low-power mode nor powered

Well, it blanks the screen.

Post by VanguardLH
off, so it is ready to hear your commands, plus your powered speakers
would also have to be always on. I suspect a voice assist device would
consume a lot less power.

[]

Post by VanguardLH
You can use a Kill-A-Watt meter to see how much your desktop or laptop
consumes when it is in full operational mode (no hibernate, low-power).

I often hear about these Kill-A-Watt meters; I get the impression
they're a lot commoner in USA than UK. I'm sure I could find such a
device if I needed one, but as a generic type of equipment - common
enough that what I presume is a trade name, is just used as a generic as
you did - it doesn't exist here.

Post by VanguardLH
Then there is you using your desktop or laptop, but someone calls out
some phrase that triggers the voice assist that interrupts your work.

Oh yes; I've often wondered if Alexa, Siri, etc. could get into a
situation where they were triggering each other!

Post by VanguardLH
When showering, I really don't want someone entering to get a cup of
water to make hot cocoa. Some things I don't want to share. If you

Yes, "boil water" (or, less drastic, "iced water")!

Post by VanguardLH
live alone, not a problem. Be sure to turn it all off when you have
visitors. You don't want family over when someone says "that asshole
cockwit cut me off in traffic" to find your computer showing anal porn
that your Mom sees.

(-:! [Though I don't think my mum would have been shocked. (Probably not
grandma, either!)]

Post by VanguardLH

Well, what can the assistant do when there is no Internet access? It
can't do the searches for you. Those get submitted to some online

Indeed. What bugs me is how they're often _presented_ as if the
intelligence is in the Alexa can. Of course, they're careful to not
exactly _say_ that, but the less technically savvy are led to _think_
that is the case, and those selling/pushing the devices - and general
concept - certainly don't make any attempt to _dis_abuse them of that
impression.

Post by VanguardLH
search engine. If using smart devices around the home, like a smart
thermostat, access to the control hub service is required, and that's

You're probably right. Once control is implemented, the ability to
remotely control - so, say, you can delay your heating coming on if
you're stuck in traffic - is handy, and once that's there anyway, I
suppose even when you're in the same home, saying "Alexa, turn up the
heat" probably _does_ involve a remote hub. (Well, definitely, as the
speech parsing is remote anyway.)
[]

Post by VanguardLH
An Alexa Echo costs around $40 device? What was the cost of your

Ah, so my local chap selling a series 3 dot for 15 pounds _is_ a good
price.

Post by VanguardLH
desktop or laptop? Obviously the Alexa is far less robust, but it is a
dedicated device to do what you want on a far more costly platform.

Yes, but I already _have_ the laptop. (Which I think was 80 pounds, as
it happens.)

--
J. P. Gilliver. UMRA: 1960/<1985 MB++G()AL-IS-Ch++(p)***@T+H+Sh0!:`)DNAf

Do ministers do more than lay people?